Bill,
Thanks for the response.
I've just runed ./swishspider . http://swish-e.org/index.html from the src
directory.
It had run correctly because I get:
drwxr-xr-x 3 root staff 1536 May 1 16:21 .
drwxr-xr-x 8 root staff 512 Apr 30 09:42 ..
-rw-r--r-- 1 root other 5321 May 1 16:21 ..contents
-rw-r--r-- 1 root other 638 May 1 16:21 ..links
-rw-r--r-- 1 root other 14 May 1 16:21 ..response
in the src directory.
But if I run the following (from src directory)
./swishspider . http://my-intranet-server-name/tmp.html.
The content in ..links is unchanged.
So, the run for the intranet URL is not working.
How do I get swishspider to to run intranet also ?
I've just search discussion group by search text "swishspider intranet",
I've found 2 links.
But they don't have problem like mine.
I've just tried the swishspider using our intranet IP address in the URL and
the ..links is unchanged.
Can anyone please shed some light on this one ?
>$url =~ s/http\:\/\/www\.losangeles\.af\.mil\///;
> into the while loop in
> sub search_parse.
Yes, the above is Perl code. The above code is to blank out
www.losangeles.af.mil from the $url variable.
-----Original Message-----
From: Bill Moseley [mailto:moseley@hank.org]
Sent: Wednesday, May 01, 2002 3:56 PM
To: KETUNG.HSIAO@LOSANGELES.AF.MIL; Multiple recipients of list
Subject: Re: [SWISH-E] HTTP Crawler
At 03:43 PM 05/01/02 -0700, Hsiao Ketung Contr 61 CS/SCBN wrote:
>I've been trying to get swish-e HTTP crawler working for the last 2 days.
>The HTTP crawler works if the IndexDir is set to a URL on my own server
>where I'm running the swish-e.
>
>It's when I set the IndexDir to URL other than my own server that I get
>"no word indexes" type of output.
If you are using the -S http method then swish is using a perl helper
program called swishspider. You can run this program alone to see if it's
fetching docs.
~/swish-e/src > ./swishspider
Usage: SwishSpider localpath url
~/swish-e/src > ./swishspider . http://swish-e.org/index.html
~/swish-e/src > ll -t | head
total 52672
-rw-r--r-- 1 lii users 5321 May 1 15:52 ..contents
-rw-r--r-- 1 lii users 638 May 1 15:52 ..links
-rw-r--r-- 1 lii users 14 May 1 15:52 ..response
that will tell you if it can fetch the remote doc.
>Also, I have to modify the Perl script in cgi-bin to make the HTTP crawler
>result
>show up correclty. I have to add this line:
>$url =~ s/http\:\/\/www\.losangeles\.af\.mil\///;
> into the while loop in
> sub search_parse.
Don't really follow that. You may be describing a cgi script I'm not
familiar with.
--
Bill Moseley
mailto:moseley@hank.org
Received on Wed May 1 23:38:10 2002