Skip to main content.
home | support | download

Back to List Archive

Re: random crashing of!?

From: Justin Tang <justin.tang(at)>
Date: Mon Jan 17 2005 - 16:20:12 GMT
Hi all:
  I think I figured out what happened, but I don't know how to solve it.  I
think what happens is that the spider is put to sleep when it can't connect
to the site(seems like it's asking me for a user name and password, but I
already set crident_time as undef), and I forked the spider out as a zombie
program, so when it sleeps the process is killed.  Is there any way around
the spider being put to sleep?  Here is a copy of the setting I have in my
config file.

my %server1 = (
        base_url => '',
        agent => 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)',
        email => '',
        link_tags => [qw/ a /],
        delay_sec => '0',
        #max_wait_time => '1',
        keep_alive => 'true',
        max_time => '10',
        max_size => '1000000',
        max_files => '100',
        max_depth => '10',
        use_md5 => 'true',
        credentials => 'username:password',
        credential_timeout => undef,
        use_cookies => 'true',
        use_head_requests => 'true',
        test_url => \&checkURL, #checks for spider traps
        test_response => sub{
                my $server = $_[1];

                print "Checking response...\n";
                print "Was the page successfully retrieved?
                $server->{no_spider}++ if !$_[2]->is_success;

                print "Page fetched correctly\n";

                print "Checking header for $_[0]\n";
                my $safeSpider = new SpiderTraps;

                my $headerResult =
$safeSpider->headerCheck($_[2]->content_type, $_[2]->code,
"/var/log/linkverification/linkcommand/592.spider", $_[0]);

                print "The result from header check is --> $headerResult

                #$server->{no_spider}++ if $headerResult == 0;

I've been stuck on this for so long... If anyone can help me out of it, I
would be so grateful...


-----Original Message-----
[]On Behalf Of Bill Moseley
Sent: Friday, January 14, 2005 10:34 PM
To: Multiple recipients of list
Subject: [SWISH-E] Re: random crashing of!?

On Fri, Jan 14, 2005 at 04:50:31PM -0800, Justin Tang wrote:
> Hi all:
>   I'm trying to use for some verification tool, and it seems to
> crashing randomly on me!!! As far as I can tell, it seem to die somewhere
> between the test_url and the test_response callback functions.  Now does
> anyone know what's a response that could kill spider completely?

Are you on shared hosting?  I had that exact problem once and it
turned out that the hosting provider had a script that killed any user
process that ran more than a few minutes.

Otherwise, what kind of crash?  I think the program trap
$SIG{__DIE__}, so it should report those kind of errors.  It doesn't
trap any other signals - well it catches SIGHUP as a way to cleanly
abort the spider.

Bill Moseley

Unsubscribe from or help with the swish-e list:

Help with Swish-e:
Received on Mon Jan 17 08:20:20 2005