> Huh? Why is
>
> Path-Name: http://arena.internet2.edu:80/sample.htm
> Content-Length: 33
> Last-Mtime: 1013569857
> <HTML>Sample document</HTML>
>
> showing up? That's stdout from the spider.cgi script that should be
> captured by swish that's running the spider. You will note that was not
in
> my example.
>
I did just notice that. I'm curious about how swish reads from the stdout.
I can capture the web documents to be indexed in one file by putting this in
the swish config file:
SwishProgParameters spider.pl>output.txt
Then the file output.txt looks something like this:
Path-Name: http://arena.internet2.edu:80/index.html
Content-Length: 17774
Last-Mtime: 1011279959
<HTML>....code for page here...</HTML>
Path-Name: http://arena.internet2.edu:80/html/contribute.html
Content-Length: 11467
Last-Mtime: 1011279964
<HTML>....more html code here...</HTML>
..etc for all web pages spidered
Would there be some way (function call in swish?) to get swish to read from
output.txt as if it were being directly passed from spider.pl in stdout so
that the effect (multiple web pages indexed) would be the same? Thanks.
Adam
Received on Wed Feb 13 18:42:26 2002