Bill Moseley writes:
> Let me know if you need help.
Thanks.
>> strcoll works flawlessly with utf-8 locales. Here's an example I ran
>> in an utf8-xterm (I used "file" to make sure I am actually typing
>> utf-8):
>
> I don't know utf-8 very well. Is that because the characters are
> single-byte utf-8 chars? I suppose if strcoll is locale aware then
> it's utf-8 aware.
Those "Umlauts" are 16 bits wide in utf-8:
$ echo -n üv|hex
0x00000000: c3 bc 76 - üv
$ LC_CTYPE=de_DE.utf-8 LC_COLLATE=de_DE.utf-8 ./a.out ü v
strcasecmp: 77
strcmp: 77
strcoll: -1
..so this was comparing a double-byte character with a single-byte
one. If I specify a wrong locale, glibc produces the expected results
for the invalid single-byte comparison:
$ LC_CTYPE=de_DE.iso-8859-1 LC_COLLATE=de_DE.iso-8859-1 ./a.out ü v
strcasecmp: 109
strcmp: 77
strcoll: -21
regards,
Andreas
Received on Thu Feb 3 11:32:48 2005