I'm trying to index multiple web sites using SwishSpiderCoinfig.pl - The following commands might help. The failure is at the bottom. I've attache the output of various commands I've used to see if I can narrow down the problem. Here it goes !
_________________________________________
swish-e -V
SWISH-E 2.4.2
___________________________________________
perl -v
This is perl, v5.8.0 built for i386-linux-thread-multi
(with 1 registered patch, see perl -V for more detail)
_________________________________________
root(at)not-real.localhost /#spider.pl default http://10.10.10.10/
--- lots of stuff ----
Summary for: http://10.10.10.10/
Connection: Keep-Alive: 16 (8.0/sec)
Duplicates: 21 (10.5/sec)
Off-site links: 764 (382.0/sec)
Skipped: 1 (0.5/sec)
Total Bytes: 176,187 (88093.5/sec)
Total Docs: 15 (7.5/sec)
Unique URLs: 17 (8.5/sec)
root@localhost cgi-bin#
__________________________________________
root@localhost cgi-bin# swish-e -S prog -c swish.conf.good
Indexing Data Source: "External-Program"
Indexing "spider.pl"
External Program found: /app/swish/lib/swish-e/spider.pl
/app/swish/lib/swish-e/spider.pl: Reading parameters from 'default'
Summary for: http://10.10.10.10/
Connection: Keep-Alive: 16 (8.0/sec)
Duplicates: 21 (10.5/sec)
Off-site links: 764 (382.0/sec)
Skipped: 1 (0.5/sec)
Total Bytes: 176,187 (88093.5/sec)
Total Docs: 15 (7.5/sec)
Unique URLs: 17 (8.5/sec)
Removing very common words...
no words removed.
Writing main index...
Sorting words ...
Sorting 577 words alphabetically
Writing header ...
Writing index entries ...
Writing word text: Complete
Writing word hash: Complete
Writing word data: Complete
577 unique words indexed.
5 properties sorted.
15 files indexed. 176,187 total bytes. 3,350 total words.
Elapsed time: 00:00:02 CPU time: 00:00:00
Indexing done!
The contents from the swish.confg.good
______________________________________
root@localhost cgi-bin# cat swish.conf.good
IndexDir spider.pl
SwishProgParameters default http://10.10.10.10/
MetaNames swishtitle swishdocpath
StoreDescription TXT* 200000
StoreDescription HTML* <body> 200000
________________________________________
contents of SwishSpiderConfig.pl
root@localhost cgi-bin# cat SwishSpiderConfig.pl
my %main_site = (
base_url => 'http://10.10.10.10/',
);
my %news_site = (
base_url => 'http://10.10.10.11/doc',
);
@servers = ( \%main_site, \%news_site );
1;
_____________________________________________
contents of swish.conf use with SwishSiderConfig.pl above.
root@localhost cgi-bin# cat swish.conf
IndexDir spider.pl
SwishProgParameters /var/www/cgi-bin/SwishSpiderConfig.pl
______________________________________________
output of swish-e -S prog -c swish.conf -- This is where I'm having problems. I'm not sure where to go from here. I'm thinking the problem is with the SwishSpiderCofnig.pl?
root@localhost cgi-bin# swish-e -S prog -c swish.conf
Indexing Data Source: "External-Program"
Indexing "spider.pl"
External Program found: /app/swish/lib/swish-e/spider.pl
/app/swish/lib/swish-e/spider.pl: Reading parameters from '/var/www/cgi-bin/SwishSpiderConfig.pl'
LWP::RobotUA from address required at /app/swish/lib/swish-e/spider.pl line 262
Removing very common words...
no words removed.
Writing main index...
err: No unique words indexed!
Received on Mon Mar 22 16:35:47 2004