Skip to main content.
home | support | download

Back to List Archive

Re: [swish-e] trying to index pdf files on windows plattform

From: Peter Karman <peter(at)not-real.peknet.com>
Date: Tue Jun 01 2010 - 02:56:48 GMT
Lukas Frei wrote on 5/27/10 2:27 AM:
> good morning everybody
> 
> i've been successfully running swish-e on a windows machine for several
> years now.  a new customer now insists on the indexing of pdf files. ok,
> i went at it, installed the newest 2.47 (with all necessary plugins) and
> tried.  but it does not work.  here is how i do it:
> 
> - the config file as in the attachment www.allpedes.ch.conf
> - running the spider with the also attached allpedes.bat (the following
> merger step with the swish-e.exe is currently not included because of
> the initial problems described below).  the file has been renamed to
> allpedes.txt due to antispam stuff with gmail.
> - the resulting www.allpedes.ch_spider.txt can be viewed here:
>   http://index.nextron.ch/www.allpedes.ch_spider.txt
> 
> interestingly, there are NO pdf files indexed / dumped into the txt
> file, but they are there on the web site
> (http://www.allpedes.ch/de_kataloge.cfm?kid=all for example - and the <a
> href>'s are easily found in the txt-file when searching for '.pdf'.
> 
> what am i missing or doing wrong?
> 

try turning the spider debugging options on, and start by indexing just a single
page to see why the .pdf links are not being followed.

http://swish-e.org/docs/spider.html#debug

-- 
Peter Karman  .  http://peknet.com/  .  peter(at)not-real.peknet.com
_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Mon May 31 22:56:50 2010