Skip to main content.
home | support | download

Back to List Archive

Re: random crashing of spider.pl!?

From: Justin Tang <justin.tang(at)not-real.positionresearch.com>
Date: Mon Jan 17 2005 - 16:20:12 GMT
Hi all:
  I think I figured out what happened, but I don't know how to solve it.  I
think what happens is that the spider is put to sleep when it can't connect
to the site(seems like it's asking me for a user name and password, but I
already set crident_time as undef), and I forked the spider out as a zombie
program, so when it sleeps the process is killed.  Is there any way around
the spider being put to sleep?  Here is a copy of the setting I have in my
config file.

my %server1 = (
        base_url => 'http://xxx.xxx.xxx/',
        agent => 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)',
        email => 'blank@blank.com',
        link_tags => [qw/ a /],
        debug =>  DEBUG_ERRORS | DEBUG_FAILED | DEBUG_SKIPPED |
DEBUG_HEADERS | DEBUG_INFO | DEBUG_URL,
        delay_sec => '0',
        #max_wait_time => '1',
        keep_alive => 'true',
        max_time => '10',
        max_size => '1000000',
        max_files => '100',
        max_depth => '10',
        use_md5 => 'true',
        credentials => 'username:password',
        credential_timeout => undef,
        use_cookies => 'true',
        use_head_requests => 'true',
        test_url => \&checkURL, #checks for spider traps
        test_response => sub{
                my $server = $_[1];

                print "Checking response...\n";
                print "Was the page successfully retrieved?
".$_[2]->is_success."\n";
                $server->{no_spider}++ if !$_[2]->is_success;

                print "Page fetched correctly\n";

                print "Checking header for $_[0]\n";
                my $safeSpider = new SpiderTraps;

                my $headerResult =
$safeSpider->headerCheck($_[2]->content_type, $_[2]->code,
"/var/log/linkverification/linkcommand/592.spider", $_[0]);

                print "The result from header check is --> $headerResult
";

                #$server->{no_spider}++ if $headerResult == 0;
        },
    );



I've been stuck on this for so long... If anyone can help me out of it, I
would be so grateful...

-Justin

-----Original Message-----
From: swish-e@sunsite3.berkeley.edu
[mailto:swish-e@sunsite3.berkeley.edu]On Behalf Of Bill Moseley
Sent: Friday, January 14, 2005 10:34 PM
To: Multiple recipients of list
Subject: [SWISH-E] Re: random crashing of spider.pl!?


On Fri, Jan 14, 2005 at 04:50:31PM -0800, Justin Tang wrote:
> Hi all:
>
>   I'm trying to use spider.pl for some verification tool, and it seems to
be
> crashing randomly on me!!! As far as I can tell, it seem to die somewhere
> between the test_url and the test_response callback functions.  Now does
> anyone know what's a response that could kill spider completely?
Thanks...

Are you on shared hosting?  I had that exact problem once and it
turned out that the hosting provider had a script that killed any user
process that ran more than a few minutes.

Otherwise, what kind of crash?  I think the program trap
$SIG{__DIE__}, so it should report those kind of errors.  It doesn't
trap any other signals - well it catches SIGHUP as a way to cleanly
abort the spider.

--
Bill Moseley
moseley@hank.org

Unsubscribe from or help with the swish-e list:
   http://swish-e.org/Discussion/

Help with Swish-e:
   http://swish-e.org/current/docs
   swish-e@sunsite.berkeley.edu
Received on Mon Jan 17 08:20:20 2005