Skip to main content.
home | support | download

Back to List Archive

Checking for 404 responses in header

From: Mark Jordan <mjordan(at)not-real.sfu.ca>
Date: Thu May 06 2004 - 18:20:04 GMT
Hi,

Can anyone tell me how to configure spider.pl so that it doesn't follow broken links? I'm picking out what I believed to be 
the HTTP::Response 'code' method in my test_response subroutine, but it doesn't seem to work. Here is my config code:


my %server = (
        base_url => 'http://www.lib.sfu.ca/',
        email => 'mjordan@sfu.ca',
        test_url => sub {  $_[0]->path !~ /(\.gif|\.jpg|\.jpeg)/i },

	test_response => sub {  $_[2]->code !~ /404/ }, # <- This doesn't seem to work

        credential_timeout => '0',
        keep_alive => 1
        );

Am I on the right track, is there a better way to make spider.pl not follow broken links, or at least not index 404 not found 
response pages?

Thanks,

Mark



Mark Jordan
Acting Coordinator of Library Systems
W.A.C. Bennett Library, Simon Fraser University
Burnaby, British Columbia, V5A 1S6, Canada
Phone (604) 291 5753 / Fax (604) 291 3023                 
mjordan(at)not-real.sfu.ca / http://www.sfu.ca/~mjordan/

 
Received on Thu May 6 11:20:08 2004