Skip to main content.
home | support | download

Back to List Archive

Re: Problem Indexing PDFs with

From: Bill Moseley <moseley(at)>
Date: Wed Dec 04 2002 - 15:41:10 GMT
On Wed, 4 Dec 2002 wrote:

> I'm running Swish-E 2.2.1 on a Solaris 9 box.  I got a filesystem index
> working flawlessly, with PDFs being parsed as TXT using pdftotext.
> Now, I'm trying to get it working using the prog method and  The
> crawl seems to works fine and HTML files get indexed using the HTML2
> parser.  I cannot get PDF files to index correctly.  When I tried the pdf
> function internal to, the PDF files were parsed as HTML2s and
> only
> between 5 and 8 words per file were indexed.  I know this is wrong because
> the same PDF files with the filesystem index yield many more indexed
> words.
> FilterDir /opt/sfw/bin
> FileFilter .pdf pdftotext "'%p' -"

Or you can filter in the program.

Bill Moseley
Received on Wed Dec 4 15:41:23 2002