On Wednesday, January 22, 2003, at 02:32 PM, Michael Tsai wrote:
> The problem is that the spider goes into an infinite loop. After going
> through all the pages on the site, it starts printing out entries like:
>
> Processing http://www.atpm.com//2.07/index.shtml...
> Processing http://www.atpm.com//2.06/index.shtml...
>
> where it adds a second forward slash after the domain name. If I leave
> it running long enough, it makes another pass over the pages with three
> slashes.
I was able to stop this from happening by putting:
return if $uri->as_string =~ m[atpm\.com//];
in test_url in spider.conf.
--Michael
Received on Fri Jan 24 02:10:59 2003