Skip to main content.
home | support | download

Back to List Archive

Re: request delay problem with

From: Bill Moseley <moseley(at)>
Date: Tue Jul 05 2005 - 20:59:00 GMT
On Tue, Jul 05, 2005 at 03:19:01PM -0400, Aliasgar Dahodwala wrote:
> what i failed to find out is, why does the spider sleep, something 
> around 5000 x delay_sec after fetching somewhere around 5824 files.
> (the exact count value is 5824).  In the debug file i have that many 
> "sleeping 5 seconds" messages, before the spider starts fetching again.
> so i am thinking there is a bug in there somehwhere.

Sounds like it.  What's magic about 5824, I wonder.  In my version of delay_request() is called inside the spider() function.
It's not the best place to call delay_request() because it's not
really making the request at that point (test_url could skip the
request, for example).  But, that's why the wait time is calculated
based on the last time a request really was completed.

Having a bunch of "sleeping 5 seconds" in there without any other
requests happening doesn't make sense.

Can you generate a simple test case?  This is what I did:


    use strict;
    use warnings;

    my $count = ( $ENV{QUERY_STRING} || '') =~ /count=(\d+)/ ? $1 + 1 : 1;

    if ( $count > 6000 ) {
        print <<EOF;
    content-type: text/html
    status: 404 Not Found

    <html><body>Not found</body></html>


    print <<EOF;
    Content-Type: text/html

    <head><title>This is doc $count</title></head>
    <a href="test.cgi?count=$count">Rec$count</a>


    Include /etc/apache/modules.conf
    ErrorLog error_log
    PidFile  pid_file
    ServerName localhost

    TypesConfig /dev/null
    Listen 4321

    DocumentRoot /home/moseley/apache

    <files test.cgi>
        Options +ExecCGI
        SetHandler cgi-script


    moseley@bumby:~/apache$ cat spider.conf 
    @servers = (
            base_url => 'http://localhost:4321/test.cgi',
            delay_sec => 5,
            keep_alive => 1,
            email => 'moseley@localhost',

Start apache:

    moseley@bumby:~/apache$ /usr/sbin/apache -d `pwd` -f httpd.conf

Run the spider:  (modified to print sleeping without debug enabled):

    moseley@bumby:~/apache$ ./ spider.conf >/dev/null
    ./ Reading parameters from 'spider.conf'
    sleeping 5 seconds
    sleeping 5 seconds
    Summary for: http://localhost:4321/test.cgi
         Connection: Close:      60  (0.1/sec)
    Connection: Keep-Alive:   5,941  (14.5/sec)
               Total Bytes: 698,679  (1704.1/sec)
                Total Docs:   6,000  (14.6/sec)
               Unique URLs:   6,001  (14.6/sec)

So it fetched 6000 docs and the sleeping messages went as expected.

Is there a way you can demonstrate what you are seeing so I can repeat

Bill Moseley

Unsubscribe from or help with the swish-e list:

Help with Swish-e:
Received on Tue Jul 5 13:59:01 2005