Skip to main content.
home | support | download

Back to List Archive

Re: Indexing of word documents, stored on a UNIX

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Fri Aug 17 2001 - 22:37:23 GMT
At 02:55 PM 08/17/01 -0700, FISHER,JOSEPH (Non-HP-Roseville,ex1) wrote:
>Hi Bill,
>
>Ok, I understand that I need to include a filter file in order to index the
>contents of MS Word documents stored on a Unix system... (As I understand
>it, this was NOT necessary under SWISH 1.3...)

That's always been the case.  Swish-e has never natively parsed word docs.
Rainer added the filter feature to allow indexing other document types.


>I've downloaded and compiled "catdoc"... Catdoc is even referenced in one of
>the filter files under SWISH-E 2.1...
>
>	.../filter-bin/_doc2text.sh

Again, I would not advise using a shell script for performance reasons.


>I've installed it in it's default location, and made sure that the filter
>file is pointing to the correct directory structure...
>
>But which configuration file should I modify to make SWISH-E sees this MS
>Word filter file?

What config files do you have?

The example in the reference SWIHS-CONFIG I posted shows:

  FileFilter .doc /usr/local/bin/catdoc "-s8859-1 -d8859-1 '%p'"

That would go in your swish configuration file.  

So you might have swish.conf

  IndexOnly .html .htm .doc .txt
  IndexContents HTML .html .htm
  IndexContents TXT .doc .txt
  FileFilter .doc /usr/local/bin/catdoc "-s8859-1 -d8859-1 '%p'"

then run

  ./swish-e -c swish.conf -i /home/docs

If the documentation is unclear please say so, and what you think needs to
be changed or is confusing.





Bill Moseley
mailto:moseley@hank.org
Received on Fri Aug 17 22:37:58 2001