Incorrect behavior for swishspider script

From: Andrew Ho <andrew(at)>
Date: Sat Apr 08 2000 - 01:42:46 GMT

The swishspider script that comes with SWISH-E has a slight error, it will
not look for or report links in any document that has a content-type that
is not exactly "text/html". Unfortunately, this means that a page with
this perfectly valid HTTP 1.1 header:

  Content-type: text/html; charset=ISO-8859-1

does not get indexed. A quick fix to the script is to change line 50 of
the script swishspider from:

  if( $response->header("content-type") eq "text/html" ) {


  if( $response->header("content-type") =~ m(text/html) ) {

On another note, perhaps there should be a configuration option to set the
full path AND FILENAME of the spidering program, such that the spidering
program does not need to be explicitly called "swishspider" (if, for
example, I wanted to write an intelligent spider of my own that knows the
structure of my site).

Or at the very least some documentation about the interaction between the
spider program and the SWISH-E indexing program.



