Skip to main content.
home | support | download

Back to List Archive

RE: Indexing pdf and doc

From: Smith, Sarah <sarah.smith(at)not-real.fresnolibrary.org>
Date: Mon Dec 20 2004 - 18:09:01 GMT
I had a typo. Swish-e was previously indexing only htm and html files
(not pdf). Many thanks,

Sarah


-----Original Message-----
From: Smith, Sarah 
Sent: Monday, December 20, 2004 10:02 AM
To: Multiple recipients of list
Subject: Indexing pdf and doc


Hello,

I had Swish-e running just fine indexing only htm and pdf files on an
intranet. I tried adding in pdf and doc index capabilities and the
indexing task (I have it run once/week) failed. My config file that it
uses is as follows (The new lines in my config file are preceded by an
asterisk, which I added for this email and are not part of the actual
file):

# The directory to index but exclude dir manlibtp and readme files
IndexDir /Data/webroot/docs FileRules dirname contains manlibtp
*FileRules filename contains readme

#To enable indexing of doc and pdf files
*FileFilter .doc /Program Files/SWISH-E/lib/swish-e/catdoc.exe '-s8859-1
-d8859-1 "%p"' 
*FileFilter .pdf /Program Files/SWISH-E/lib/swish-e/pdftotext.exe "%p -"


# Don't want to index .txt as they are mostly readme files *IndexOnly
htm .html .pdf .doc

# How to process
IndexContents HTML .html .htm
*IndexContents TXT2 .pdf .txt .doc 

# Allow searching by title, path
MetaNames swishtitle swishdocpath

# To output body text and enable highlighting
StoreDescription HTML* <content> 256
*StoreDescription TXT 256

# Replaces actual path with URL
ReplaceRules replace /Data/webroot/docs http://url

Sarah
Received on Mon Dec 20 10:09:01 2004