Skip to main content.
home | support | download

Back to List Archive

Spidering using conf

From: Z <techlistreader(at)not-real.yahoo.com>
Date: Fri Aug 18 2006 - 14:37:59 GMT
I have started from scratch and have tried to use spider.pl with the configuration file from the documentation but have an error of "err: No unique words indexed!"

#######################################
The conf file:
#######################################
# Use spider.pl for indexing (location of spider.pl set at installation time)
IndexDir spider.pl

# Use spider.pl's default configuration and specify the URL to spider
SwishProgParameters default http://www.swish-e.org
# Allow extra searching by title, path
Metanames swishtitle swishdocpath

# Set StoreDescription for each parser
# to display context with search results
StoreDescription TXT* 10000
StoreDescription HTML* <body> 10000 

# SPIDER_DEBUG=failed,url,links,headers
# an attempt at debugging, but more errors ensued
IndexFile ./test.index

#######################################
 Comman line code:
#######################################
  
swish-e.exe -S prog -c test.conf

#######################################
Result:
#######################################
   
E:\INETPUB\WWWROOT\SITE\WINDOWS>swish-e.exe -S prog -c test.conf
Indexing Data Source: "External-Program"
Indexing "spider.pl"
External Program found: E:\INETPUB\WWWROOT\SITE\WINDOWS\lib\swish-e/spider.pl
E:\INETPUB\WWWROOT\SITE\WINDOWS\lib\swish-e\spider.pl: Reading parameters from
'default'

Summary for: http://www.swish-e.org
Connection: Close: 1  (0.0/sec)
      Unique URLs: 1  (0.0/sec)
Removing very common words...
no words removed.
Writing main index...
err: No unique words indexed!
.

#######################################

Z

 		
---------------------------------
Stay in the know. Pulse on the new Yahoo.com.  Check it out. 


*********************************************************************
Due to deletion of content types excluded from this list by policy,
this multipart message was reduced to a single part, and from there
to a plain text message.
*********************************************************************
Received on Fri Aug 18 07:38:04 2006