> > 6. m$ word docs aren't indexing properly. Unfortunately, I just
noticed
> this and have not researched it at all. I just ran the index again on
a
> subdirectory and noticed that all word docs are showing that only 1
word
> gets indexed. Here's the config file:
> > In dir "z:/subdirectory/subsub":
> > Word doc 1.doc - Using DEFAULT (HTML2) parser - (1 words)
> > Word doc 2.doc - Using DEFAULT (HTML2) parser - (1 words)
>
> Just need to setup a FileFilter directive that uses catdoc, wvware, a
> SWISH::Filter script or some other word converter.
>
Would you mind giving some examples? I've tried a multitude of things
but I'm definitely not formulating the FileFilter directive correctly
for my setup.
I've located catdoc.exe, doc2txt.pm, and doc2html.pm. When I use the PM
files as the filter and run the indexer, it opens the pm files up in
word pad! I then tried passing them as parameters to perl; i.e.
FileFilter .doc "perl.exe
e:/swish-e/lib/swish-e/perl/swish/filters/doc2html.pm"
This didn't raise an error but it followed each word doc with "(no words
indexed)".
I also tried cutting and pasting from the documentation to use the
catdoc method, but even though I changed the path, it says it can't find
the executable.
OH - and another interesting tidbit: despite the fact that its
supposedly NOT indexing word documents, it apparently is indexing some
of them. Here's an example search result:
1 October PSO minutes.doc -- rank: 1000
(null)
Last Modified Date: 1998-11-18 14:20:48 Eastern Standard Time
Document Size: 712512
Document Path: file://fileservername/subfolder//path/to/October
PSO minutes.doc
Last question: what should I be seeing instead of (null), as what does
that mean I have to do to get the output correct? It does this for
documents of pdf, rtf and doc.
Thanks,
Jim
Received on Wed Jun 22 08:53:15 2005