Skip to main content.
home | support | download

Back to List Archive

Re: Spidering using conf

From: Peter Karman <peter(at)not-real.peknet.com>
Date: Fri Aug 18 2006 - 17:44:32 GMT
use the spider directly and see what kind of output it is generating.

Z scribbled on 8/18/06 9:36 AM:
> I have started from scratch and have tried to use spider.pl with the configuration file from the documentation but have an error of "err: No unique words indexed!"
> 
> #######################################
> The conf file:
> #######################################
> # Use spider.pl for indexing (location of spider.pl set at installation time)
> IndexDir spider.pl
> 
> # Use spider.pl's default configuration and specify the URL to spider
> SwishProgParameters default http://www.swish-e.org
> # Allow extra searching by title, path
> Metanames swishtitle swishdocpath
> 
> # Set StoreDescription for each parser
> # to display context with search results
> StoreDescription TXT* 10000
> StoreDescription HTML* <body> 10000 
> 
> # SPIDER_DEBUG=failed,url,links,headers
> # an attempt at debugging, but more errors ensued
> IndexFile ./test.index
> 
> #######################################
>  Comman line code:
> #######################################
>   
> swish-e.exe -S prog -c test.conf
> 
> #######################################
> Result:
> #######################################
>    
> E:\INETPUB\WWWROOT\SITE\WINDOWS>swish-e.exe -S prog -c test.conf
> Indexing Data Source: "External-Program"
> Indexing "spider.pl"
> External Program found: E:\INETPUB\WWWROOT\SITE\WINDOWS\lib\swish-e/spider.pl
> E:\INETPUB\WWWROOT\SITE\WINDOWS\lib\swish-e\spider.pl: Reading parameters from
> 'default'
> 
> Summary for: http://www.swish-e.org
> Connection: Close: 1  (0.0/sec)
>       Unique URLs: 1  (0.0/sec)
> Removing very common words...
> no words removed.
> Writing main index...
> err: No unique words indexed!
> .
> 
> #######################################
> 
> Z
> 
>  		
> ---------------------------------
> Stay in the know. Pulse on the new Yahoo.com.  Check it out. 
> 
> 
> *********************************************************************
> Due to deletion of content types excluded from this list by policy,
> this multipart message was reduced to a single part, and from there
> to a plain text message.
> *********************************************************************
> 

-- 
Peter Karman  .  http://peknet.com/  .  peter(at)not-real.peknet.com
Received on Fri Aug 18 10:44:42 2006