Skip to main content.
home | support | download

Back to List Archive

Re: Indexing other document types with SWISH::Filter

From: Peter Karman <peter(at)>
Date: Mon Aug 21 2006 - 01:33:28 GMT
andy rosbrook scribbled on 8/20/06 3:25 PM:
> Hi all, just a quick question, ive been reading the docs with regards to the SWISH::Filter and, i've tired out the following to test indexing a pdf doc with the following command:
> swish-filter-test foo.pdf foo.txt
> i get the following result:
> Document foo.pdf was  filtered.
>    Document:     foo.pdf  (foo.pdf)
>    Content-Type: text/html
>    Parser type:  HTML*
>    >Filter used: SWISH::Filters::Pdf2HTML=HASH(0x9dd70f0) ( application/pdf -> text/html )
> ** /usr/local/bin/swish-filter-test:
>   Failed to open 'foo.txt': No such file or directory
> Whats the problem here? I presume the document was filterd ok? 

check the usage for swish-filter-test. What you asked it to do was filter 2 
documents: foo.pdf and foo.txt. I think you were expecting it to write the 
contents of an input (foo.pdf) to an output (foo.txt) but that's not what 
swish-filter-test does.

see perldoc swish-filter-test

You might have meant:

  $ swish-filter-test -content foo.pdf > foo.txt

which would print the content of the converted/filtered foo.pdf to stdout.

> On another note, is there anything that needs to be included in the spider config to get the SWISH::Filter working for pdf documents? Or is it automatic?

I believe it is automatic, as long as swish-filter-test works (which in your 
case it appears to). But be sure to read


to be sure...

Peter Karman  .  .  peter(at)
Received on Sun Aug 20 18:33:33 2006