Skip to main content.
home | support | download

Back to List Archive

Re: Problem with foreign characters

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Fri Dec 21 2001 - 15:26:43 GMT
At 04:30 AM 12/21/2001 -0800, Zambra - Michael wrote:

>The index contains a page with the word "Camarn".
>If I search for "Camarn" the search engine shows the hit, but without the
accented character. Bill pointed out that the indexer might still working
wrong because it was indexing "camar" and "n" and interpreting "" as a
blank. I don't think so, because the engine is unable to find "camar" or "n".

Hi Miguel,

It was never an issue with the script as far as I know, but rather wrong
default WordCharacters in swish.  I updated that on Dec 7th.

http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/swishe/swish-e/src/config.h

Are you sure you don't have any WordCharacters settings in your config?
Are you sure you are really using a newer version of swish?

Doe you get the same results:

~/swish_archive %./swish-e -T index_header | fgrep WordCh
# WordCharacters:
0123456789abcdefghijklmnopqrstuvwxyz


The swish-e list archive script is the one that's in the current swish-e
distribution, too.  Go there and search for Camarn

     http://swish-e.org/Discussion/search/swish.cgi

Here's the entire config file for indexing the archives:

IndexDir ./index_mail.pl
MetaNames swishtitle name email
PropertyNames name email
PropertyNamesDate sent
IndexContents HTML2 .html
StoreDescription HTML2 <body> 100000
UndefinedMetaTags  ignore

So I'm using the default settings in swish for WordChars.

Check your versions once again.

I really doubt that your locale setting would effect this, but if all else
fails 

Bill Moseley
mailto:moseley@hank.org
Received on Fri Dec 21 15:26:58 2001