Skip to main content.
home | support | download

Back to List Archive

Re: Problems with sorting German Umlaut

From: Uwe Dierolf <swishe(at)not-real.ubka.uni-karlsruhe.de>
Date: Wed Feb 02 2005 - 14:46:39 GMT
Bill,

thank you for your fast help.

Am Mon, Jan 31, 2005 at 09:25:58AM -0800 schrieb Bill Moseley:
> > we are using swish-e 2.4.3.
> > We feed swish-e by putting XML files into it.
> > One xml tag which is used for sorting the search result
> > contains german umlauts.
> 
> Properties are pre-sorted at indexing time.  The function that does
> this is called Compare_Properties() in docprop.c.  For string
> properties flagged as "case:compare" it uses the library function
> strncmp(), which does not take LC_COLLATE into consideration.  For
> strings marked as "case:ignore" it uses strncasecmp() which does check
> LC_COLLATE.
> If your property is flagged as case:ignore then check your locale
> (LC_COLLATE) setting.
> There's a strcoll() function to replace strcmp(), but the code would
> need to be rewritten since the strings are not null terminated.
> 
> You can check your property's case setting by running
>    swish-e -f myindex -T index_metanames
> Use PropertyNamesIgnoreCase to set properties to ignore case.

We are using the default "case:ignore" for properties.
We checked the implementation of strncasecmp (see below).
This function does not take into consideration the value
of LC_COLLATE (under SuSE Linux 9.x).

Would it be possible for you or other swish-e developers to 
change the swish-e source so that it will use strcoll?
We need correctly sorted results. 
Right now we get A..O..U..Z instead of A..O..U..Z.

Thanks a lot in advance, Uwe Dierolf

--------------------------------------------------------------------------
Uwe Dierolf                       Tel  0721/608-6076
University Library of Karlsruhe   Fax  0721/608-4886
Strae am Forum                   76049 Karlsruhe / Germany
--------------------------------------------------------------------------


#include <string.h>
#include <stdio.h>
#include <locale.h>

int main(int argc, char **argv) {

  setlocale(LC_COLLATE, "de_DE");

  if (argc != 3) {
    puts("bentige 2 Argumente");
    return -1;
  } else {
    printf("strcasecmp: %d\n"
           "    strcmp: %d\n"
           "   strcoll: %d\n",
           strcasecmp(argv[1], argv[2]),
           strcmp(argv[1], argv[2]),
           strcoll(argv[1], argv[2]));
  }
  return 0;

}
sortlocale a b         sortlocale  b  
strcasecmp: -1         strcasecmp: 130 
    strcmp: -1             strcmp: 130 
   strcoll: -1            strcoll: -1  
                                       
sortlocale a A         sortlocale u   
strcasecmp: 0          strcasecmp: -135
    strcmp: 32             strcmp: -135
   strcoll: -2            strcoll: -9  
                                       
sortlocale a          sortlocale  v  
strcasecmp: -131       strcasecmp: 134 
    strcmp: -131           strcmp: 134 
   strcoll: -9            strcoll: -1  
Received on Wed Feb 2 06:46:40 2005