Skip to main content.
home | support | download

Back to List Archive

Re: Problem with foreign characters

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Sun Dec 02 2001 - 15:26:17 GMT
At 07:07 AM 12/2/2001 -0800, Zambra - Michael wrote:
>
>Hello,
>
>I have installed Swish-e (latest dev) on a Unix system (Sun OS 5.7). I
think I have done it successfully. I have linked to the xml-parsing and the
zlib libraries.

The WordCharacters setting is missing the .  I think that's a mistake.  I
have to look at ISO-8859-1 chars for another project, so I'll try to review
Swish-e's default wordcharacter settings.

Also look at "TranslateCharacters".

# WordCharacters:
0123456789abcdefghijklmnopqrstuvwxyz€ƒŠŒŽšœžŸ


>
>I can index without problems and have used the swish.cgi script in order
to to searches. It yields correct results, but always ommiting the foreign
characters as [].
>
>I find this VERY strange, because if it finds words with foreign
characters (I did a search for "Camarn" and it yielded results), why does
it show the results without these characters in the page title lines? 

It's not the swish.cgi script.  It's swish, and the query works because of
this:

Indexing with -T parsed_words indexed_words

White-space found word 'Camarn'
    Adding:[swishtitle:11]   'camar'   Pos:1  Stuct:0x1 ( FILE )
    Adding:[swishtitle:11]   'n'   Pos:2  Stuct:0x1 ( FILE )

So It's indexing it as two words.  When you search for 'Camarn' swish
breaks that into two words also, and finds both of the "words".

I think the way WordCharacters is set is not great.  It would be nice to be
able to add or subtract characters, and to be able to specify characters in
some escaped way, too.

Anyone have ideas on the best format for such a directive? 



Bill Moseley
mailto:moseley@hank.org
Received on Sun Dec 2 15:26:50 2001