Re: Problem with foreign characters

From: Bill Moseley <moseley(at)>
Date: Fri Dec 21 2001 - 15:41:07 GMT
At 04:30 AM 12/21/2001 -0800, Zambra - Michael wrote:

>The index contains a page with the word "Camarón".
>If I search for "Camarón" the search engine shows the hit, but without the
accented character. Bill pointed out that the indexer might still working
wrong because it was indexing "camar" and "n" and interpreting "ó" as a
blank. I don't think so, because the engine is unable to find "camar" or "n".

I just tried that on your site after posting.  Odd.  When you first posted
swish didn't have "ó" in its WordChars -- and I was seeing the exact
problem you were seeing.  And as you can see from the swish-e list archive,
it's not working.

Can you run ./swish-e -w Camarón  and post that (or at least look for
"Camarón" in that output.  Does swish report Camarón in the title or Camar n?

Why it's odd is that the search script reads the WordCharacters setting
from the header of the search results and uses that to split up the words
for highlighting.  But the space in there is coming from swish, unless
something weird is happening on your system.  Or maybe there's a simple
explanation that I don't see right now.

The key to solving problems like this are use one simple source file, then
use -T indexed_words properties to see exactly what swish is storing.

Then run swish from the command line to see what's coming out.

Bill Moseley
Received on Fri Dec 21 15:42:36 2001