Skip to main content.
home | support | download

Back to List Archive

Re: Checking for 404 responses in header

From: Mark Jordan <mjordan(at)not-real.sfu.ca>
Date: Thu May 06 2004 - 18:58:58 GMT
Hi Bill,

On Thu, May 06, 2004 at 11:25:44AM -0700, Bill Moseley wrote:
> On Thu, May 06, 2004 at 11:19:13AM -0700, Mark Jordan wrote:
> > Hi,
> > 
> > Can anyone tell me how to configure spider.pl so that it doesn't
> > follow broken links?
> 
> It has to follow them to find out they are broken.
>

OK, you got me.
 
> > Am I on the right track, is there a better way to make spider.pl not
> > follow broken links, or at least not index 404 not found response
> > pages?
> 
> Are you 404 pages indeed returning a status of 404?
> 

You're right - using wget I was able to confirm that what are being reported as broken links are returning a 200 response. We
have a custom "not found" handler but I assumed it was still returning 404. Let me see if I can configure the web server to
return the proper response code and then retry my spider configuratation. If I have the server return 404, should my spider
config work as expected?

Mark


Mark Jordan
Acting Coordinator of Library Systems
W.A.C. Bennett Library, Simon Fraser University
Burnaby, British Columbia, V5A 1S6, Canada
Phone (604) 291 5753 / Fax (604) 291 3023                 
mjordan(at)not-real.sfu.ca / http://www.sfu.ca/~mjordan/

 
Received on Thu May 6 11:58:59 2004