Skip to main content.
home | support | download

Back to List Archive

Antw: Re: [SWISH-E:417] Swish-e: Problems with Non-ASCII-Chars (e.g. German Umlaut)

From: Rainer Scherg RTC <Rainer.Scherg(at)>
Date: Tue Aug 11 1998 - 11:38:44 GMT
> > But I've got some problems searching words with german umlauts in the
>  swish 
> > database. The problems also occurs when searching for words (with
>  umlauts) 
> > in simple html pages.

>       We use the C3-API to 'solve' this by first normalyzing any
> unicode (utf8, utf7). Then we use it again to convert everything into
> 7 bit ascii using a look-like conversion alsy part of the C3 api. So
> things like the 'u-umloud' become an 'u' (rather than the sound like
> conversion which gives you an 'u' and 'eu').
>       This the text we index. 
>       We do the same magic to the search string. Though not very 
> beautifull, it does kind of work :-)


Yes, that would work as workaround... ;-)
But IMO a workaround doesn't solve a problem...

Does anyone know how special characters (Umlauts, other language special
chars) are stored within the index file? I haven't looked in the swish-e
source yet. 

Is it possible to use a common (mapped) charset to fix this problem?
e.g.: mapping &uuml; to ISO-8859-1 - characters.

ciao Rainer
Received on Tue Aug 11 05:38:49 1998