*** I've been having some trouble with file attachments. Just in case this
one didn't get through intact, here it is again ***
Here is a modified version of the Swish Spider that can handle FRAMESET
HTML documents.
Before trying to use it, please read the notes below:
* It works by reading through frame source links and creating a single HTML
file which is passed on to be indexed.
* Although Swish-E does not index documents in different domains, the
spidering operation that reads through the frame source links *does*. This
is because frame source HTML documents are sometimes in a different
domains.
* Any href links found in any frame source files are passed on as if they
were links off that single HTML file. Because these may be in different
domains, they may not be indexed.
* If you start your HTTP spidering with a file which you know is part of a
frame set, but which is not the root frame set file THIS NEW SPIDER CAN NOT
RECOGNISE THAT. It will *only* spider those frame set files BELOW the file
that you start with.
* PLEASE NOTE:
This version of the Swish Spider has been modified by Chris Humphries.
It comes with no guarantees.
It has been tested to a limited degree on real data.
It has not been tested exhaustively on all possible cases.
* I hope that you find this useful. If you have any problems with this new
version of the spider, please tell me, but I must warn you that I am fairly
busy most of the time and may not be able to reply to you straight away.
Chris Humphries
29/2/2000
Received on Mon Mar 6 10:22:56 2000