Skip to main content.
home | support | download

Back to List Archive

Indexing pdf and doc

From: Smith, Sarah <sarah.smith(at)>
Date: Mon Dec 20 2004 - 18:08:47 GMT

I had Swish-e running just fine indexing only htm and pdf files on an
intranet. I tried adding in pdf and doc index capabilities and the
indexing task (I have it run once/week) failed. My config file that it
uses is as follows (The new lines in my config file are preceded by an
asterisk, which I added for this email and are not part of the actual

# The directory to index but exclude dir manlibtp and readme files
IndexDir /Data/webroot/docs
FileRules dirname contains manlibtp
*FileRules filename contains readme

#To enable indexing of doc and pdf files
*FileFilter .doc /Program Files/SWISH-E/lib/swish-e/catdoc.exe '-s8859-1
-d8859-1 "%p"' 
*FileFilter .pdf /Program Files/SWISH-E/lib/swish-e/pdftotext.exe "%p -"

# Don't want to index .txt as they are mostly readme files
*IndexOnly .htm .html .pdf .doc

# How to process
IndexContents HTML .html .htm
*IndexContents TXT2 .pdf .txt .doc 

# Allow searching by title, path
MetaNames swishtitle swishdocpath

# To output body text and enable highlighting
StoreDescription HTML* <content> 256
*StoreDescription TXT 256

# Replaces actual path with URL
ReplaceRules replace /Data/webroot/docs http://url

Received on Mon Dec 20 10:08:54 2004