Tuesday, November 13, 2007

bug in xp

bug in xp
Now, two different but similar explanations can be given.

The first is that, after the ASCII-to-hex conversion of the string, Notepad rearranges the hex codes not according to ASCII standards, but to Unicode, and that messes it up. Here's the example:

Take "bush hid the facts". The hex codes (they can be seen with any hex editor you want to download) for the string are:

62 75 73 68 20 68 69 64 20 74 68 65 20 66 61 63 74 73

Arrange the codes to make up Unicode characters and you get:

7562 6873 6820 6964 7420 6568 6620 6163 7473

You'll notice that every code is hyperlinked. If you click on each one of them, you'll see that each one represents a Chinese (I think) "letter".

So this whole thing's cause is the coincidence that the 18 ASCII characters happen to represent 9 Unicode characters. And, of course, Windows' inability to determine the right encoding of the file.

The second explanation is slightly different, but the basics are the same: the difference between ASCII and Unicode. It's just a matter of Notepad defaults. You see, when you save the file, in the "Encoding" field, the default drop-down is set to ANSI. So, by default, Notepad saves as ANSI. But if you do a File -> Open, the default Encoding is set to Unicode. That's exactly what happens when you double click a saved file. Notepad knows the path, but not the Encoding. So it uses the default Unicode encoding, which spits the Chinese characters as explained above.