Skip to main content.
home | support | download

Back to List Archive

Re: Indexing remote documents

From: Peter Karman <peter(at)not-real.peknet.com>
Date: Sun Jun 05 2005 - 13:52:37 GMT
Thomas Nyman scribbled on 6/5/05 8:24 AM:

> Sorry about the encrypted mail .. my mistake... which file contains  
> the necessary parameters used when spidering. I found on the site the  
> following
> 
> #    my %ccenter = (
> 
> #            email       => 'Lance.Perry(at)not-real.ourdomain.com',
> #            base_url    => 'http://our.domain.com/ccenter/',
> #            delay_sec   => '0',
> #            max_depth   => '1',
> #            credentials => 'username:password'
> 
> #   );
> 
> #    @servers = ( \%ccenter );
> 
> the question is where should this go?
>


it goes in a config file, by default SwishSpiderConfig.pl. You can name it 
anything you want (e.g., myconfig), if you call it by name from the command 
line. Be sure to take out the leading # signs -- those "comment out" the lines 
in a Perl script.

called like (for example):

  $ spider.pl myconfig | swish-e -S prog -i stdin

to test it, just do:

  $ spider.pl myconfig


which will print to stdout.










> 
> 
> 
> 5 jun 2005 kl. 13.54 skrev Thomas Nyman:
> 
> 
>>Hi
>>
>>I have created a conf file that contains
>>
>>IndexDir http://192.168.1.2/archive/
>>
>>I wish to index all files found in the "archive" on the remote
>>machine. The remote machine uses htpasswd to access it, so one need a
>>password to surf to the machine.
>>
>>When running swish i  recieve the following messages
>>
>>Indexing Data Source: "HTTP-Crawler"
>>Indexing "http://192.168.1.2/archive/"
>>Removing very common words...
>>no words removed.
>>Writing main index...
>>err: No unique words indexed!
>>
>>It seems that its not indexing any documents.
>>
>>I have not made any particular changes to any other file than my conf
>>file.
>>
>>I can successfully index on the same machine that swish is  
>>installed on.
>>
>>I'm guessing I'm missing something here but I'm not sure what. I
>>would appreciate any pointers. If someone wants me to send additional
>>info I will.
>>
>>Thanks
>>
>>Thomas
>>
>>

-- 
Peter Karman  .  http://peknet.com/  .  peter(at)not-real.peknet.com
Received on Sun Jun 5 06:52:37 2005