Skip to main content.
home | support | download

Back to List Archive

[swish-e] 'No words indexed' problem with PDFs

From: Lyle Jensen <lyle.jensen(at)not-real.gmail.com>
Date: Wed Nov 10 2010 - 20:27:02 GMT
I am using SWISH-e 2.4.7, indexing several hundred PDF files.

If I use the 'prog' method, all of the files will index.  However, due to
Directory Browsing issues, I cannot use the 'prog' method in production.

But, if I use the 'fs' method, a few of those PDFs will index, but will
report 'no words indexed'.

I've also noticed that on those files that *do* say 'n words' (indexed), the
indexed words count differs between the 'prog' method and the 'fs' method.

Any ideas as to what is causing this and how to resolve it?

prog swish.conf:
============
IndexDir /perl/bin/perl.exe
IndexFile /inetpub/wwwroot/btsp/swish/swish.index
SwishProgParameters "c:/swish-e/lib/swish-e/spider.pl"
"c:/swish-e/sites/btsp/SwishSpiderConfig.pl"
IndexOnly .html .htm .pdf .doc

fs swish.conf:
==========
IndexDir "C:/inetpub/wwwroot/btsp/docs"
IndexFile "c:/inetpub/wwwroot/btsp/swish/fs-swish.index"
IndexReport 3
IndexOnly .doc .htm .html .pdf .mp4 .ppt .pptx
NoContents .mp4 .ppt .pptx
ReplaceRules replace "C:/inetpub/wwwroot/btsp/docs" "
http://localhost/btsp/docs"
FileFilter .doc C:/SWISH-E/bin/catdoc "-s8859-1 -d8859-1 \"%p\""
FileFilter .pdf C:/SWISH-E/bin/pdftotext '"%p" -'

-- 
Sent by Lyle Jensen


_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Wed Nov 10 15:27:05 2010