Skip to main content.
home | support | download

Back to List Archive

Re: various problems on windows

From: Philippe A. <futhark77(at)not-real.gmail.com>
Date: Sat Sep 23 2006 - 16:26:21 GMT
The accents problems was the problem bothering me the most and I have now
solved it. I am using the following config file.

IndexFile index.swish-e
IndexOnly .doc .pdf
IndexContents TXT* .doc .pdf
FileFilter .doc catdoc "%p -s8859-1 -d8859-1"
FileFilter .pdf pdftotext "%p -enc Latin1 -nopgbrk -"
TranslateCharacters :ascii7:

To eliminate accented letters, the trick is the following:
- Ensure the documents are parsed and output with the right encoding. Invok=
e
tools manually to find the right combinations (thanks again for that one
Bill). Remember that accents may not print on your screen if your shell has
another encoding, but that's secondary. What's important is how things get
indexed.
- TranslateCharacters :ascii7: option will strip accents from indexed words=
.
If you want to see how montr=E9al got indexed, try "swish-e -k m". This opt=
ion
will also strip accents from search words, if they are in the right encodin=
g
(ascii 8 bit worked for me). The end result will be that you'll be able to
use a search word like "montreal" or "montr=E9al".



*********************************************************************
Due to deletion of content types excluded from this list by policy,
this multipart message was reduced to a single part, and from there
to a plain text message.
*********************************************************************
Received on Sat Sep 23 09:26:24 2006