I realize that the behavior spider.pl has now is as
described in the docs, and that I brought this up on
the list about 9 months ago. Apologies for repeating
myself.
Though I still think the feature should work as I
describe below, I realize this decision's already been
made and documented, and I defer to the decisionaires.
:)
Thanks also for the workaround tips, Bill.
jrobinson
--- Bill Moseley <moseley@hank.org> wrote:
> On Tue, Sep 27, 2005 at 09:52:39AM -0700, J Robinson
> wrote:
> > Because I don't want to slam the server, even if I
> am
> > using keepalives to minimize the impact.
> Retrieving
> > pages off the server could still cause significant
> > load (ie, dynamic pages). If I wanted to hit pages
> as
> > fast as possible, I'd set delay_sec to 0! :)
>
> They you can just sleep in any of the call-back
> functions.
>
> Normally, the point of the keep-alive connection is
> to make the best
> use of the web server's limited resources. If one
> client is holding a
> keep alive connection open then it should be busy
> using that
> connection. Otherwise, free it up for another
> client to use.
>
> Might want to look at other issues if it only takes
> one busy
> connection to kill the server's performance. But,
> again, it's easy to
> put a delay in, say, test_url if you want to add
> additional delay.
> Just don't delay so much that the web server kills
> the keep-alive
> connection. Then you are just making the problem
> worse.
>
> --
> Bill Moseley
__________________________________
Yahoo! Mail - PC Magazine Editors' Choice 2005
http://mail.yahoo.com
Received on Tue Sep 27 10:18:25 2005