Skip to main content.
home | support | download

Back to List Archive

[swish-e] Problems indexing PDFs not in root web directory and raw numbers

From: Parker, Peter A CONTRACTOR WRAIR-Wash DC <Peter.Parker(at)not-real.AMEDD.ARMY.MIL>
Date: Mon Sep 24 2007 - 19:20:37 GMT
Greetings fellow Swish-e users,
I have recently completed installation of Swish-e on an apache server
machine with the follows details:

Swish-e version - 2.4.5
Apache version - 2.0.52

I have two short questions:

1) I have noticed that indexing of PDF files seems to be limited to the
root directory. I have PDFs in the root directory and ones in a sub
directory. Only PDFs in the root directory ever appear in search
results. It is my understanding that swish-e automatically recurses
subdirectories when indexing. Is this not also the case with indexing of

2) I have also noticed that Swish-e does not seem to be indexing numbers
inside of Excel or other Office files very well. When I search for a
number I know to be in an indexed file, for example 22469, the search
often yeilds no results.

Here is the contents of my configuration file:

IndexFile index.swish-e
IndexDir /var/www/html
IndexDir /var/www/twiki/data
FollowSymLinks yes
WordCharacters abcdefghijklmnopqrstuvwxyz0123456789.-
IgnoreFirstChar .-
IgnoreLastChar  .-
BeginCharacters abcdefghijklmnopqrstuvwxyz0123456789
EndCharacters   abcdefghijklmnopqrstuvwxyz0123456789
ReplaceRules remove /var/www/html
FollowSymLinks yes
IndexReport 2
IgnoreWords file:
TranslateCharacters :ascii7:
BumpPositionCounterCharacters |.
IndexOnly .html .htm .doc .ppt .xls .pdf .rtf .txt .jpg .bmp .png
NoContents .jpg .gif .bmp .png .ico
FileFilter .pdf share/doc/swish-e/examples/filter-bin/
IndexContents HTML .pdf

Any support you can provide is greatly appreciated!

Users mailing list
Received on Mon Sep 24 15:20:59 2007