Skip to main content.
home | support | download

Back to List Archive

Multiple web sites

From: Lung.Allen <Allen.Lung(at)not-real.ftb.ca.gov>
Date: Tue Mar 23 2004 - 00:35:47 GMT
I'm trying to index multiple web sites using SwishSpiderCoinfig.pl -  The following commands might help.  The failure is at the bottom.  I've attache the output of various commands I've used to see if I can narrow down the problem.  Here it goes !
_________________________________________
swish-e -V
SWISH-E 2.4.2
___________________________________________
perl -v
This is perl, v5.8.0 built for i386-linux-thread-multi
(with 1 registered patch, see perl -V for more detail)
_________________________________________
root(at)not-real.localhost /#spider.pl default http://10.10.10.10/
--- lots of stuff ----
Summary for: http://10.10.10.10/
Connection: Keep-Alive:      16  (8.0/sec)
            Duplicates:      21  (10.5/sec)
        Off-site links:     764  (382.0/sec)
               Skipped:       1  (0.5/sec)
           Total Bytes: 176,187  (88093.5/sec)
            Total Docs:      15  (7.5/sec)
           Unique URLs:      17  (8.5/sec)
root@localhost cgi-bin#
__________________________________________
root@localhost cgi-bin# swish-e -S prog -c swish.conf.good
Indexing Data Source: "External-Program"
Indexing "spider.pl"
External Program found: /app/swish/lib/swish-e/spider.pl
/app/swish/lib/swish-e/spider.pl: Reading parameters from 'default'

Summary for: http://10.10.10.10/
Connection: Keep-Alive:      16  (8.0/sec)
            Duplicates:      21  (10.5/sec)
        Off-site links:     764  (382.0/sec)
               Skipped:       1  (0.5/sec)
           Total Bytes: 176,187  (88093.5/sec)
            Total Docs:      15  (7.5/sec)
           Unique URLs:      17  (8.5/sec)
Removing very common words...
no words removed.
Writing main index...
Sorting words ...
Sorting 577 words alphabetically
Writing header ...
Writing index entries ...
  Writing word text: Complete
  Writing word hash: Complete
  Writing word data: Complete
577 unique words indexed.
5 properties sorted.
15 files indexed.  176,187 total bytes.  3,350 total words.
Elapsed time: 00:00:02 CPU time: 00:00:00
Indexing done!

The contents from the swish.confg.good
______________________________________

root@localhost cgi-bin# cat swish.conf.good
IndexDir spider.pl
SwishProgParameters default http://10.10.10.10/
MetaNames swishtitle swishdocpath
StoreDescription TXT* 200000
StoreDescription HTML* <body> 200000
________________________________________
contents of SwishSpiderConfig.pl

root@localhost cgi-bin# cat SwishSpiderConfig.pl
my %main_site = (
            base_url   => 'http://10.10.10.10/',
        );


        my %news_site = (
            base_url   => 'http://10.10.10.11/doc',
        );

        @servers = ( \%main_site, \%news_site );
        1;

_____________________________________________
contents of swish.conf use with SwishSiderConfig.pl above.

root@localhost cgi-bin# cat swish.conf
IndexDir spider.pl
SwishProgParameters /var/www/cgi-bin/SwishSpiderConfig.pl
______________________________________________
output of swish-e -S prog -c swish.conf -- This is where I'm having problems.  I'm not sure where to go from here.  I'm thinking the problem is with the SwishSpiderCofnig.pl?

root@localhost cgi-bin# swish-e -S prog -c swish.conf
Indexing Data Source: "External-Program"
Indexing "spider.pl"
External Program found: /app/swish/lib/swish-e/spider.pl
/app/swish/lib/swish-e/spider.pl: Reading parameters from '/var/www/cgi-bin/SwishSpiderConfig.pl'
LWP::RobotUA from address required at /app/swish/lib/swish-e/spider.pl line 262
Removing very common words...
no words removed.
Writing main index...
err: No unique words indexed!
Received on Mon Mar 22 16:35:47 2004