Skip to main content.
home | support | download

Back to List Archive

Re: swish-e only spiders the server it started on

From: Rob Mangiafico <rmang(at)not-real.lexiconn.com>
Date: Fri May 19 2006 - 15:54:24 GMT
> I solved my problem by changing the spider.pl program. I am using 
> swish-e 2.4.3. I added a configuration option "follow_url" to the spider 
> config file, and the necessary code to spider.pl to handle it.  The 
> config file then had in it:
>     ...
>     base_url => 'http://aaa.com/index.html',
>     follow_url => ['http://bbb.com', 'http://ccc.com'],
>      ...  etc ...
> ...
> Following Bill's suggestion for spider.pl and using max_depth may be 
> enough for you. My code is available if you want it.

This may be quite handy code for a lot of people, as we had to do 
something similar to what Bill suggested for a three site search with 
independent content. Please do share it with the community.  :)

Rob
Received on Fri May 19 08:54:30 2006