Skip to main content.
home | support | download

Back to List Archive

Re: request delay problem with spider.pl

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Tue Jul 05 2005 - 18:18:07 GMT
On Tue, Jul 05, 2005 at 09:13:08AM -0700, Aliasgar Dahodwala wrote:
> I am running swish-e 2.4.3 on a redhat linux box. I am using the 
> included spider.pl script to spider my website.
> 
> My problem: When i enable the keep_alive directive of the spider program 
> and set the delay_sec to 5, the spider fetches the pages at blazing 
> speed ignoring the delay_sec directive, and after going through around 
> 5000 pages it then catches up on all the delay, it stops fetching any 
> more pages and just keeps sleeping for 5 seconds each. After a long wait 
> it continues from where it left off.

Sounds like a bug.  By design it ignores the delay_sec setting in a
keep alive connection.  The point of the keep alive is to allow faster
requests -- avoiding the time required to start up the new connection.

>From the docs:

# delay_sec

    This optional key sets the delay in seconds to wait between
    requests.  See the LWP::RobotUA man page for more information. The
    default is 5 seconds. Set to zero for no delay.

    When using the keep_alive feature (recommended) the delay will be
    used only where the previous request returned a "Connection:
    closed" header.


So after fetching 5000 docs (is that your MaxKeepAliveRequests set to
5000?) you are saying that the spider delays delay_sec seconds x 5000
before it fetches any more documents?

Let's see, the wait time is set here:

    my $wait = $server->{delay_sec} - ( time - $server->{last_response_time} );
    return unless $wait > 0;
    sleep( $wait );

That last_response_time is the time the last request was completed,
which should normally be almost the same as the current time, so you
end up with delay_sec.  So I don't see how it could be delaying more
than delay_sec.

Is that what you mean?

-- 
Bill Moseley
moseley@hank.org

Unsubscribe from or help with the swish-e list: 
   http://swish-e.org/Discussion/

Help with Swish-e:
   http://swish-e.org/current/docs
   swish-e@sunsite.berkeley.edu
Received on Tue Jul 5 11:18:07 2005