Skip to main content.
home | support | download

Back to List Archive

Re: swish-e 2.4.3 windows 2003 iis success!

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Wed Jun 22 2005 - 16:31:34 GMT
On Wed, Jun 22, 2005 at 08:45:31AM -0700, Revillini, James wrote:
> Would you mind giving some examples?  I've tried a multitude of things
> but I'm definitely not formulating the FileFilter directive correctly
> for my setup.
> 
> I've located catdoc.exe, doc2txt.pm, and doc2html.pm.  When I use the PM
> files as the filter and run the indexer, it opens the pm files up in
> word pad!

That's nice of Windows to do that for you.  Where would Wordpad open
if you were indexing on a remote machine?

> 
> FileFilter .doc "perl.exe
> e:/swish-e/lib/swish-e/perl/swish/filters/doc2html.pm"

What's doc2html.pm?  Do you mean Doc2html.pm?  That's not a
FileFilter.

Can you find your way through a little Perl?

What I'd try is using the DirTree.pl program.  That should
automatically filter for you.  It uses the SWISH::Filter module which
deals with setting up filtering.

You would likely need to edit DirTree.pl to only fetch the files you
want indexed, but it's not very hard to do.  Then you can run it like
this:

    perl /path/to/DirTree.pl /dir/to/index /other/dir > out.txt

That fetches and filters your documents and writes to out.txt.  Try it
on a small directory first, of course.  The use your favorite editor
to look at out.txt to make sure things are being filtered.

Then you import that data into swish like this:

    swish-e -S prog -c config -i stdin < out.txt

> OH - and another interesting tidbit: despite the fact that its
> supposedly NOT indexing word documents, it apparently is indexing some
> of them.  Here's an example search result:

We didn't say it wouldn't index them, but swish (and libxml2) probably
don't do a very good job at parsing the native .doc format.

> Last question: what should I be seeing instead of (null), as what does
> that mean I have to do to get the output correct?  It does this for
> documents of pdf, rtf and doc.

Means you don't have a description defined.

-- 
Bill Moseley
moseley@hank.org

Unsubscribe from or help with the swish-e list: 
   http://swish-e.org/Discussion/

Help with Swish-e:
   http://swish-e.org/current/docs
   swish-e@sunsite.berkeley.edu
Received on Wed Jun 22 09:31:35 2005