On Wed, Dec 03, 2003 at 04:22:05AM -0800, John Angel wrote:
> I have added chars above ASCII 127 to WordCharacters but it still displays
> blanks instead of them. Where's the catch?
You need to give an example of what's not working.
> BTW, I have noticed that in WordCharacters there are only small caps chars.
Yes, words are lowercased with "tolower()" as you noticed. So only
lower case need to be specified.
> UTF-8 support would be great, but I understand it requires major rewrite. Is
> it possible to have at least full 8-bit chars support instead?
It is full 8-bit, but there's a conversion to Latin1 when using libxml2
so it may not be 100% 8-bit "clean". I have not tested that with
libxml2.
BTW - First thing swish-e does when starting is:
setlocale(LC_CTYPE, "");
but that's only in the binary. (So that might result in problems when
people use the Swish-e API on systems with different locales -- that is,
tolower() might not change umlauts on indexing but would on searching.q
> Searching through previous posts shows that the problem could be in
> UTF8Toisolat1() and tolower() functions, but I am not sure how to change and
> fix that.
Can you provide a specific example of the problem?
--
Bill Moseley
moseley@hank.org
Received on Wed Dec 3 19:27:26 2003