Skip to main content.
home | support | download

Back to List Archive

RE: HTTP Crawler

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Wed May 01 2002 - 23:45:49 GMT
At 04:36 PM 05/01/02 -0700, Hsiao Ketung Contr 61 CS/SCBN wrote:
>
>But if I run the following (from src directory)
>./swishspider . http://my-intranet-server-name/tmp.html.
>
>The content in ..links is unchanged.

How about the response code and the content?


>So, the run for the intranet URL is not working.
>How do I get swishspider to to run intranet also ?

Find out what's blocking the request.


>Can anyone please shed some light on this one ?
>
>>$url =~ s/http\:\/\/www\.losangeles\.af\.mil\///;
>>	into  the while loop in
>>	sub search_parse.
>Yes, the above is Perl code.  The above code is to blank out
>www.losangeles.af.mil from the $url variable.

Yes, I know what it does.  I just don't know what that applies to.  Some
CGI script you are running?

If all the slashes make you dizzy then you might try:

  $url =~ s[\Qhttp://www.losangeles.af.mil][];

\Q is probably not needed.



-- 
Bill Moseley
mailto:moseley@hank.org
Received on Wed May 1 23:46:03 2002