Skip to main content.
home | support | download

Back to List Archive

RE: HTTP Crawler

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Thu May 02 2002 - 16:50:56 GMT
At 09:27 AM 05/02/02 -0700, Hsiao Ketung Contr 61 CS/SCBN wrote:
>User-Agent: *
>Disallow: /somedirectory/
>Disallow: /somedirectory/
>..
>
>What does robots.txt does and 
>what's your suggestion ?

Google is your friend.

http://www.robotstxt.org/wc/robots.html

If you were to use -S prog with spider.pl you can tell it to ignore
robots.txt.  But, I'd suggest you try to get -S http method working first
before trying to tackle the -S prog / spider.pl setup with swish.


-- 
Bill Moseley
mailto:moseley@hank.org
Received on Thu May 2 16:51:00 2002