Hey all.
In the spider.pl script that comes with Swish-E, it generates a user agent
object (LWP::UserAgent or LWP::RobotUA) for each server hash in the array
generated by the config file. That object is stored in the server hash, but is
not removed until the whole array of servers is processed. As a result, the
connection to each server does not close automatically.
Thus, when running the spider over a large number of sites, a backlog of
unclosed connections builds up, which eventually prevents new connections from
being opened (at least, in the case of Win32).
To fix this, I've just removed the user agent from the server hash once
spidering of each server is complete, letting it close as control falls off
the end of that block of code.
That is, I've added a new line
$server->{ua} = undef;
at line 263 of 'spider.pl'.
I'm not a great or even experienced Perl coder (taught myself about a month
ago), so there may be hidden reasons why this is a bad idea.
Either way, it works for me, and stops Swish-E (well, 'spider.pl') from dying
(after about 130 sites).
FYI, I'm working on Win2k, using Swish-E 2.2.1 (spider.pl v1.43, apparently).
I hope this is useful.. If anyone can see why this is a bad idea, please tell
me.. :)
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Trond Nilsen Alchemy Group
Software Engineer http://www.alchemy.co.nz
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Received on Mon Oct 7 07:30:20 2002