Skip to main content.
home | support | download

Back to List Archive

Re: Problems with sorting German Umlaut

From: Andreas Seltenreich <andreas.seltenreich(at)not-real.ubka.uni-karlsruhe.de>
Date: Thu Feb 03 2005 - 19:32:42 GMT
Bill Moseley writes:

> Let me know if you need help.

Thanks.

>> strcoll works flawlessly with utf-8 locales. Here's an example I ran
>> in an utf8-xterm (I used "file" to make sure I am actually typing
>> utf-8):
>
> I don't know utf-8 very well.  Is that because the characters are
> single-byte utf-8 chars?  I suppose if strcoll is locale aware then
> it's utf-8 aware.

Those "Umlauts" are 16 bits wide in utf-8:

$ echo -n v|hex
0x00000000: c3 bc 76                -                         v

$ LC_CTYPE=de_DE.utf-8 LC_COLLATE=de_DE.utf-8 ./a.out  v
strcasecmp: 77
    strcmp: 77
   strcoll: -1

..so this was comparing a double-byte character with a single-byte
one. If I specify a wrong locale, glibc produces the expected results
for the invalid single-byte comparison:

$ LC_CTYPE=de_DE.iso-8859-1 LC_COLLATE=de_DE.iso-8859-1 ./a.out  v
strcasecmp: 109
    strcmp: 77
   strcoll: -21

regards,
Andreas
Received on Thu Feb 3 11:32:48 2005