Skip to main content.
home | support | download

Back to List Archive

Re: Problems with sorting German Umlaut

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Thu Feb 03 2005 - 17:47:41 GMT
On Thu, Feb 03, 2005 at 06:31:03PM +0100, Andreas Seltenreich wrote:
> Sadly, ISO C doesn't know strNcoll. Naively, I'd just copy and
> zero-terminate the strings and feed them to strcoll, using static
> memory to make the penalty bearable. But I'm afraid, will this have to
> be implemented thread safe? Is it okay to introduce a new string
> properties flag "case:locale" or similar to make it runtime
> configurable?

The point of making it configurable so that you can fallback to the
old strncasecmp() if you don't need it?

You are right about the penalty.  This code is used in a qsort routine
so the same strings are tested over and over.  It's not just a penalty
once for each string, but often over and over.  So don't really want
to malloc(), compare, free() inside the code that is the compare 
function for qsort.

Might be better to figure out where those strings are allocated and
allocate another byte and make them null-terminated to start with.

Could allocate a few static buffers, but then have the problem of
thread saftey.  There's other code that does that, though (not sure if
it's thread safe as they buffers are somewhat localized).

What about local stack arrays two bytes long and then do a
char-by-char test of one character strings.  That is, have a two
two-byte static vars and copy the chars from the string one-by-one and
then comprare those two-byte strings.

Just one more thing that won't work when we move to utf-8.  (how does
utf-8 sort??  Do some languages sort to the top?)

I guess I'd try and dig up the source for strcoll and modify it to be
strncoll.


-- 
Bill Moseley
moseley@hank.org

Unsubscribe from or help with the swish-e list: 
   http://swish-e.org/Discussion/

Help with Swish-e:
   http://swish-e.org/current/docs
   swish-e@sunsite.berkeley.edu
Received on Thu Feb 3 09:47:41 2005