Skip to main content.
home | support | download

Back to List Archive

Re: spidering, delay_sec and keep_alive

From: J Robinson <jrobinson852(at)not-real.yahoo.com>
Date: Tue Sep 27 2005 - 17:18:23 GMT
I realize that the behavior spider.pl has now is as
described in the docs, and that I brought this up on
the list about 9 months ago. Apologies for repeating
myself.

Though I still think the feature should work as I
describe below, I realize this decision's already been
made and documented, and I defer to the decisionaires.
:)

Thanks also for the workaround tips, Bill.

jrobinson

--- Bill Moseley <moseley@hank.org> wrote:

> On Tue, Sep 27, 2005 at 09:52:39AM -0700, J Robinson
> wrote:
> > Because I don't want to slam the server, even if I
> am
> > using keepalives to minimize the impact.
> Retrieving
> > pages off the server could still cause significant
> > load (ie, dynamic pages). If I wanted to hit pages
> as
> > fast as possible, I'd set delay_sec to 0! :)
> 
> They you can just sleep in any of the call-back
> functions.
> 
> Normally, the point of the keep-alive connection is
> to make the best
> use of the web server's limited resources.  If one
> client is holding a
> keep alive connection open then it should be busy
> using that
> connection.  Otherwise, free it up for another
> client to use.
> 
> Might want to look at other issues if it only takes
> one busy
> connection to kill the server's performance.  But,
> again, it's easy to
> put a delay in, say, test_url if you want to add
> additional delay.
> Just don't delay so much that the web server kills
> the keep-alive
> connection.  Then you are just making the problem
> worse.
> 
> -- 
> Bill Moseley


		
__________________________________ 
Yahoo! Mail - PC Magazine Editors' Choice 2005 
http://mail.yahoo.com
Received on Tue Sep 27 10:18:25 2005