Skip to main content.
home | support | download

Back to List Archive

RE: HTTP Crawler

From: Hsiao Ketung Contr 61 CS/SCBN <KETUNG.HSIAO(at)not-real.LOSANGELES.AF.MIL>
Date: Thu May 02 2002 - 16:27:39 GMT

This is intersting.
There is   http://my-intranet-server-name/robots.txt and
the time stamp of robots.txt is June 1999 , before I took this job.
I'll have to see what it does and if I can temporarily remove/rename it
and try to run swishspider again.

The content of it is:

User-Agent: *
Disallow: /somedirectory/
Disallow: /somedirectory/

What does robots.txt does and 
what's your suggestion ?

 	Ketung Hsiao
 	Web Admin/Developer

-----Original Message-----
From: David L Norris []
Sent: Wednesday, May 01, 2002 10:36 PM
To: Multiple recipients of list
Subject: [SWISH-E] RE: HTTP Crawler

On Wed, 2002-05-01 at 18:37, Hsiao Ketung Contr 61 CS/SCBN wrote:
> But if I run the following (from src directory)
> ./swishspider . http://my-intranet-server-name/tmp.html.

Is there a robot control file blocking the URL?

 David Norris
  Dave's Web -
  Augury Net -
  ICQ - 412039
Received on Thu May 2 16:27:45 2002