I had a typo. Swish-e was previously indexing only htm and html files
(not pdf). Many thanks,
Sarah
-----Original Message-----
From: Smith, Sarah
Sent: Monday, December 20, 2004 10:02 AM
To: Multiple recipients of list
Subject: Indexing pdf and doc
Hello,
I had Swish-e running just fine indexing only htm and pdf files on an
intranet. I tried adding in pdf and doc index capabilities and the
indexing task (I have it run once/week) failed. My config file that it
uses is as follows (The new lines in my config file are preceded by an
asterisk, which I added for this email and are not part of the actual
file):
# The directory to index but exclude dir manlibtp and readme files
IndexDir /Data/webroot/docs FileRules dirname contains manlibtp
*FileRules filename contains readme
#To enable indexing of doc and pdf files
*FileFilter .doc /Program Files/SWISH-E/lib/swish-e/catdoc.exe '-s8859-1
-d8859-1 "%p"'
*FileFilter .pdf /Program Files/SWISH-E/lib/swish-e/pdftotext.exe "%p -"
# Don't want to index .txt as they are mostly readme files *IndexOnly
htm .html .pdf .doc
# How to process
IndexContents HTML .html .htm
*IndexContents TXT2 .pdf .txt .doc
# Allow searching by title, path
MetaNames swishtitle swishdocpath
# To output body text and enable highlighting
StoreDescription HTML* <content> 256
*StoreDescription TXT 256
# Replaces actual path with URL
ReplaceRules replace /Data/webroot/docs http://url
Sarah
Received on Mon Dec 20 10:09:01 2004