Skip to main content.
home | support | download

Back to List Archive

Re: 8-bit chars

From: Bill Moseley <moseley(at)>
Date: Wed Dec 03 2003 - 19:23:51 GMT
On Wed, Dec 03, 2003 at 04:22:05AM -0800, John Angel wrote:
> I have added chars above ASCII 127 to WordCharacters but it still displays 
> blanks instead of them. Where's the catch?

You need to give an example of what's not working.

> BTW, I have noticed that in WordCharacters there are only small caps chars.

Yes, words are lowercased with "tolower()" as you noticed.  So only
lower case need to be specified.

> UTF-8 support would be great, but I understand it requires major rewrite. Is 
> it possible to have at least full 8-bit chars support instead?

It is full 8-bit, but there's a conversion to Latin1 when using libxml2
so it may not be 100% 8-bit "clean".  I have not tested that with

BTW - First thing swish-e does when starting is:

      setlocale(LC_CTYPE, "");

but that's only in the binary.  (So that might result in problems when
people use the Swish-e API on systems with different locales -- that is,
tolower() might not change umlauts on indexing but would on searching.q

> Searching through previous posts shows that the problem could be in 
> UTF8Toisolat1() and tolower() functions, but I am not sure how to change and 
> fix that.

Can you provide a specific example of the problem?

Bill Moseley
Received on Wed Dec 3 19:27:26 2003