Skip to main content.
home | support | download

Back to List Archive

spanish searching, enyas etc...

From: Brad Miele <brad(at)>
Date: Wed Feb 25 2004 - 18:57:32 GMT
Sorry for two in two days. I hope that I can solve this one as easily as

Finally, I had time to work on my multilanguage search.

Mostly is going swimmingly using the followinc conf:

WordCharacters abcdefghijklmnopqrstuvwxyz0123456789.
IgnoreFirstChar .-
IgnoreLastChar  .-
BeginCharacters abcdefghijklmnopqrstuvwxyz0123456789
EndCharacters   abcdefghijklmnopqrstuvwxyz0123456789
IndexReport 2
IgnoreTotalWordCountWhenRanking yes
IndexComments 0
BumpPositionCounterCharacters |.
FuzzyIndexingMode Stemming_es
DefaultContents XML
MetaNames sphotogs categories sort_date qphotographer image_restrictions
id agents_off crop profile
UndefinedMetaTags index
PropertyNamesDate sort_date
PropertyNames id photographer subject released orig_id date_shot weight  image_restrictions short_caption tsize siteowner adweight
SwishProgParameters sp

this hands off to my script which creates xml with the specification:

<?xml version='1.0' encoding="ISO-8859-1"?>

the indexes are going through great, and searches on the word espana (
with the ~ over the n) is searched correctly, as are all words with
spanish characters.

It would seem that the problem has beeen solved, but it is not. now i need
to get espana (without the spanish n to return the same set of results,
but alas, I have not been able to. I tried TranslateCharacters option,
that ended up removing all the results.

I could so some kung fu in the indexer to index both versions of the
words, but it seems clunky.

So any advice appreciated, and since I am most likeley leaving out
critical pieces, I am fully prepared for bill to ask me to send
requests for more specifics

 Brad Miele
 Technology Director
 (207) 828-8787 x110

 Oh, I don't blame Congress.  If I had $600 billion at my disposal, I'd
be irresponsible, too.
		-- Lichty & Wagner
Received on Wed Feb 25 10:57:37 2004