Skip to main content.
home | support | download

Back to List Archive

Freezing up on PDFs...

From: Anthony Baratta <anthony(at)not-real.2plus2partners.com>
Date: Fri Jul 30 2004 - 17:53:45 GMT
I've been struggling with using swish-e on a Windows 2000 server. I'm 
spidering the target site and when I hit a pdf file with "errors" (Missing 
'endstream') the spider can lockup.

I've replaced the pdftotext program with the latest version (v3 1/22/2004) 
and tested it on the problematic pdfs. It throws the same errors but does 
create a "text" file with some garbage characters with all the text. It 
appears that swish-e is either waiting for an exit code that never comes 
from pdftotext or can not handle the output with garbage characters.

Has anyone else seen this?

Here's some config info, if necessary:

Swish-e v2.4.2 for windows

batch file for spidering (wrapped for reading)

"C:\Program Files\SWISH-E\swish-e.exe"
	-S prog -v 3 -c
	"C:\Program Files\SWISH-E\indexes\SiteName\SiteName.config"
	-f "
	C:\Program Files\SWISH-E\indexes\SiteName\index.swish-e"

config file

IndexDir perl.exe
SwishProgParameters "C:\\Progra~1\\SWISH-E\\lib\\swish-e\\spider.pl" 
default "http://www.site.com"
ReplaceRules remove http://www.site.com

IndexContents HTML* .asp .htm .html .pdf
StoreDescription HTML* <body> 320
Received on Fri Jul 30 10:53:59 2004