Discussion:
Parsing error in 1.9.2
Benjamin Cathey
2010-11-06 17:40:54 UTC
Permalink
Not sure the best method to use to get a response. I am having an issue
with my client simply closing it's server connection due to a parsing error.

I posted about it here
https://github.com/ln/xmpp4r/issues/#issue/10/comment/522331 but thought
I would ping this list as well.

If anyone has thoughts on how to resolve this, it would be much
appreciated. I found many references to using force_encode however I
have been unable to resolve this successfully (which porting working
code to 1.9.2)

Thanks

Benjamin
Brian Candler
2010-11-08 09:55:54 UTC
Permalink
Post by Benjamin Cathey
Not sure the best method to use to get a response. I am having an
issue with my client simply closing it's server connection due to a
parsing error.
I posted about it here
https://github.com/ln/xmpp4r/issues/#issue/10/comment/522331 but
The suggestion there about adding
# encoding: utf-8
to source files is almost certainly wrong. String literals in source files
without this tag get encoding "US-ASCII" (*), but the error clearly says the
conflicting encoding is "ASCII-8BIT", so it must be a String read from a
socket, not a literal.

ruby 1.9's string encoding rules are so ludicrously complicated (and
undocumented) that I'm afraid it's quite common for people to offer
well-meaning but incorrect advice.
Post by Benjamin Cathey
If anyone has thoughts on how to resolve this, it would be much
appreciated.
If you want to get to grips with ruby 1.9 string encodings, I have
documented about 200 behaviours here:
https://github.com/candlerb/string19/blob/master/string19.rb

However that's only the tip of the iceberg. In ruby 1.9, as you've found,
methods may not even return at all (i.e. they may raise an exception) if
they don't like the encodings of particular Strings.

So if you're dealing with a third-party library like REXML, then it ought to
document the encoding-related behaviour for every method which accepts a
String, and also every method which returns a String. Virtually no
libraries do this, so it's down to reverse-engineering each one.
Post by Benjamin Cathey
I have been unable to resolve this successfully (which porting
working code to 1.9.2)
For me, I am sticking permanently with ruby 1.8.x. There are a few things in
1.9 which are improvements, but the string encoding nonsense breaks the
entire language as far as I'm concerned. If and when ruby 1.8 is no longer
maintained, then hopefully different languages will be sufficiently
developed to take its place (Reia may be one)

Regards,

Brian.

(*) Actually, String literals containing \x or \u escapes can get different
encodings even in US-ASCII source. One of many special-case rules.
Loading...