Skip to main content.
home | support | download

Back to List Archive

Re: HTTP Crawler

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Wed May 01 2002 - 23:01:17 GMT
At 03:43 PM 05/01/02 -0700, Hsiao Ketung Contr 61 CS/SCBN wrote:
>I've been trying to get swish-e HTTP crawler working for the last 2 days.
>The HTTP crawler works if the IndexDir  is set to a URL on my own server 
>where I'm running the swish-e.
>
>It's when I set the IndexDir to URL other than my own server that I get
>"no word indexes"  type of output.

If you are using the -S http method then swish is using a perl helper
program called swishspider.  You can run this program alone to see if it's
fetching docs.

~/swish-e/src > ./swishspider
Usage: SwishSpider localpath url

~/swish-e/src > ./swishspider . http://swish-e.org/index.html

~/swish-e/src > ll -t | head
total 52672
-rw-r--r--   1 lii      users        5321 May  1 15:52 ..contents
-rw-r--r--   1 lii      users         638 May  1 15:52 ..links
-rw-r--r--   1 lii      users          14 May  1 15:52 ..response

that will tell you if it can fetch the remote doc.



>Also,  I have to modify the Perl script in cgi-bin to make the HTTP crawler
>result 
>show up correclty. I have to add this line:
>$url =~ s/http\:\/\/www\.losangeles\.af\.mil\///;
>	into  the while loop in
>	sub search_parse.

Don't really follow that.  You may be describing a cgi script I'm not
familiar with.


-- 
Bill Moseley
mailto:moseley@hank.org
Received on Wed May 1 23:01:28 2002