Skip to main content.
home | support | download

Back to List Archive

Re: Spider Design Flaw!

From: Ron Samuel Klatchko <rsk(at)not-real.corpmail.brightmail.com>
Date: Sun Feb 27 2000 - 01:54:57 GMT
A couple of solutions to this problem:

1) Fix it via your web server (most web servers have a way of specifying
the name of the default page).  This solves the problem without any
software development and makes better use of your server and bandwidth
resources.

2) Have the spider handle the tag.  Before writing the response file, see
if this META tag is specified and if so, translate it into an HTTP
redirect.  Write the response with an HTTP permanent rediect (I believe
that's a 304, but please double check that) and then write the new URL on
the next line.

moo

On Sat, 26 Feb 2000, PropheZine Owner wrote:
> As by the number of posts I have sent in you can tell I am experimenting
> with Spidering and also AutoSwish.  Thank you all for your help.
> 
> Here is a design flaw.  I'm not knocking anyone as I think the software is
> wonderful.  I wish I knew "c" and Perl better to offer modifications.
> 
> I have a website that is 4+ years old.  Back then we created a directory
> (actually we have this problem in many directories) and instead of an
> index.html we had a file named archives.html.  We added ssi at some point
> and since the search engines had the archive.html indexed we created
> archives.shtml and turned the archive.html into a redirect page.
> 
> Later we created an index.html and inserted this code:
> 
>   <META HTTP-EQUIV="Refresh" CONTENT="1;
> URL=http://www.prophezine.com/search/database/archives.html">
> 
> Turns out that when I insert http://www.prophezine.com/search/database/ in
> the config file it only indexes the index.html page that is returned.  That
> page has some meta tags but no body.
> 
> What is needed is a change to the spider to follow the refresh tag.  I am
> not sure of all the tags possible so there may be another to follow but this
> should definitely be followed.
> 
> Thoughts?
> 
> Bob
> 
> 
Received on Sat Feb 26 21:00:00 2000