> -----Original Message-----
> From: Bill Moseley [mailto:moseley@hank.org]
> Sent: Wednesday, June 22, 2005 12:31 PM
> To: Revillini, James
> Cc: Multiple recipients of list
> Subject: Re: swish-e 2.4.3 windows 2003 iis success!
>
> On Wed, Jun 22, 2005 at 08:45:31AM -0700, Revillini, James wrote:
> > Would you mind giving some examples? I've tried a multitude of
things
> > but I'm definitely not formulating the FileFilter directive
correctly
> > for my setup.
> >
> > I've located catdoc.exe, doc2txt.pm, and doc2html.pm. When I use
the PM
> > files as the filter and run the indexer, it opens the pm files up in
> > word pad!
>
> That's nice of Windows to do that for you. Where would Wordpad open
> if you were indexing on a remote machine?
>
> >
> > FileFilter .doc "perl.exe
> > e:/swish-e/lib/swish-e/perl/swish/filters/doc2html.pm"
>
> What's doc2html.pm? Do you mean Doc2html.pm? That's not a
> FileFilter.
>
> Can you find your way through a little Perl?
>
> What I'd try is using the DirTree.pl program. That should
> automatically filter for you. It uses the SWISH::Filter module which
> deals with setting up filtering.
>
> You would likely need to edit DirTree.pl to only fetch the files you
> want indexed, but it's not very hard to do. Then you can run it like
> this:
>
> perl /path/to/DirTree.pl /dir/to/index /other/dir > out.txt
This worked with a small directory pretty well, but bombed when I tried
to index the big muthah.
I'm getting a ton of these:
1048 Warning - //fileservername/folder/path/to/files/some-document.doc:
Use of uninitialized value in waitpid at
e:\swish-e\lib\swish-e\perl/SWISH/Filter.pm line 1375.
I'm getting a bunch of these:
Failed to set content type for document
'//fileservername/folder/path/to/files/some-document.doc'
And right before it bombs I get about 1 page of these:
Can't opendir(//fileservername/folder/path/to/a/folder): Invalid
argument
at dirtree.pl line 88
I've tried to find other people with the same thing happening, but all
happened before the current release, so I don't know what's been fixed.
No one seems to be having the waitpid issue on line 1375, so that must
be a rewrite. Also, as you may have ascertained, Perl isn't my forte,
but I can find my way around it when I need to.
>
> That fetches and filters your documents and writes to out.txt. Try it
> on a small directory first, of course. The use your favorite editor
> to look at out.txt to make sure things are being filtered.
>
> Then you import that data into swish like this:
>
> swish-e -S prog -c config -i stdin < out.txt
>
> > OH - and another interesting tidbit: despite the fact that its
> > supposedly NOT indexing word documents, it apparently is indexing
some
> > of them. Here's an example search result:
>
> We didn't say it wouldn't index them, but swish (and libxml2) probably
> don't do a very good job at parsing the native .doc format.
>
> > Last question: what should I be seeing instead of (null), as what
does
> > that mean I have to do to get the output correct? It does this for
> > documents of pdf, rtf and doc.
>
> Means you don't have a description defined.
>
> --
> Bill Moseley
> moseley@hank.org
>
> Unsubscribe from or help with the swish-e list:
> http://swish-e.org/Discussion/
>
> Help with Swish-e:
> http://swish-e.org/current/docs
> swish-e@sunsite.berkeley.edu
Received on Wed Jun 22 11:17:46 2005