Skip to main content.
home | support | download

Back to List Archive

Indexing other document types with SWISH::Filter

From: andy rosbrook <andy_rosbrook(at)not-real.hotmail.com>
Date: Sun Aug 20 2006 - 20:28:10 GMT
Hi all, just a quick question, ive been reading the docs with regards to the SWISH::Filter and spider.pl, i've tired out the following to test indexing a pdf doc with the following command:

swish-filter-test foo.pdf foo.txt

i get the following result:

Document foo.pdf was  filtered.
   Document:     foo.pdf  (foo.pdf)
   Content-Type: text/html
   Parser type:  HTML*

   >Filter used: SWISH::Filters::Pdf2HTML=HASH(0x9dd70f0) ( application/pdf -> text/html )
** /usr/local/bin/swish-filter-test:
  Failed to open 'foo.txt': No such file or directory

Whats the problem here? I presume the document was filterd ok? 

On another note, is there anything that needs to be included in the spider config to get the SWISH::Filter working for pdf documents? Or is it automatic?

thanks
andy
_________________________________________________________________
Be one of the first to try Windows Live Mail.
http://ideas.live.com/programpage.aspx?versionId=5d21c51a-b161-4314-9b0e-4911fb2b2e6d
Received on Sun Aug 20 13:28:16 2006