Skip to main content.
home | support | download

Back to List Archive

RE: HTTP Crawler

From: Hsiao Ketung Contr 61 CS/SCBN <KETUNG.HSIAO(at)not-real.LOSANGELES.AF.MIL>
Date: Thu May 02 2002 - 16:51:07 GMT
Bill,

.response is 500.
like this:
-rw-r--r--   1 root     other       5321 May  1 16:21 ..contents
-rw-r--r--   1 root     other        638 May  1 16:21 ..links
-rw-r--r--   1 root     other          4 May  1 16:32 ..response

% more ..response
500
%

.response is changed and the other 2 files are not changed by looking
at the time stamp.
I'll have to see what ..response =500 means.



-----Original Message-----
From: Bill Moseley [mailto:moseley@hank.org]
Sent: Wednesday, May 01, 2002 4:45 PM
To: Hsiao Ketung Contr 61 CS/SCBN; Multiple recipients of list
Subject: RE: [SWISH-E] HTTP Crawler


At 04:36 PM 05/01/02 -0700, Hsiao Ketung Contr 61 CS/SCBN wrote:
>
>But if I run the following (from src directory)
>./swishspider . http://my-intranet-server-name/tmp.html.
>
>The content in ..links is unchanged.

How about the response code and the content?


>So, the run for the intranet URL is not working.
>How do I get swishspider to to run intranet also ?

Find out what's blocking the request.


>Can anyone please shed some light on this one ?
>
>>$url =~ s/http\:\/\/www\.losangeles\.af\.mil\///;
>>	into  the while loop in
>>	sub search_parse.
>Yes, the above is Perl code.  The above code is to blank out
>www.losangeles.af.mil from the $url variable.

Yes, I know what it does.  I just don't know what that applies to.  Some
CGI script you are running?

If all the slashes make you dizzy then you might try:

  $url =~ s[\Qhttp://www.losangeles.af.mil][];

\Q is probably not needed.



-- 
Bill Moseley
mailto:moseley@hank.org
Received on Thu May 2 16:51:10 2002