andy rosbrook scribbled on 8/20/06 3:25 PM:
> Hi all, just a quick question, ive been reading the docs with regards to the SWISH::Filter and spider.pl, i've tired out the following to test indexing a pdf doc with the following command:
>
> swish-filter-test foo.pdf foo.txt
>
> i get the following result:
>
> Document foo.pdf was filtered.
> Document: foo.pdf (foo.pdf)
> Content-Type: text/html
> Parser type: HTML*
>
> >Filter used: SWISH::Filters::Pdf2HTML=HASH(0x9dd70f0) ( application/pdf -> text/html )
> ** /usr/local/bin/swish-filter-test:
> Failed to open 'foo.txt': No such file or directory
>
> Whats the problem here? I presume the document was filterd ok?
>
check the usage for swish-filter-test. What you asked it to do was filter 2
documents: foo.pdf and foo.txt. I think you were expecting it to write the
contents of an input (foo.pdf) to an output (foo.txt) but that's not what
swish-filter-test does.
see perldoc swish-filter-test
You might have meant:
$ swish-filter-test -content foo.pdf > foo.txt
which would print the content of the converted/filtered foo.pdf to stdout.
> On another note, is there anything that needs to be included in the spider config to get the SWISH::Filter working for pdf documents? Or is it automatic?
>
I believe it is automatic, as long as swish-filter-test works (which in your
case it appears to). But be sure to read
perldoc spider.pl
to be sure...
--
Peter Karman . http://peknet.com/ . peter(at)not-real.peknet.com
Received on Sun Aug 20 18:33:33 2006