Michael-
Please respond to this list instead of to me directly.
> >That's odd. I just tried running swishspider manually on that site and
> >saw that it had no problem extracting the links. What version of SWISH
> >are you running?
>
> 1.3
1.3 exactly or 1.3.x?
> Well, this is bizarre:
>
> root@lsminfo:/usr/local/etc# perl ../bin/swishspider ./test
> http://nbl.rutgers.edu/
> root@lsminfo:/usr/local/etc# ls -la test.*
> -rw-r--r-- 1 root root 9593 Sep 22 19:45 test.contents
> -rw-r--r-- 1 root root 45 Sep 22 19:45 test.response
Okay, that makes a little more sense (at least is explains why swish
doesn't have any further links to crawl). There is a known bug in the
distributed version of swish where files that have charsets in their
mime types are not properly spidered. You could try applying the
following patches:
http://sunsite.berkeley.edu/SWISH-E/Patches/spider
http://sunsite.berkeley.edu/SWISH-E/Patches/spider2
I'm not sure if that would fix it because when I get the URL you provide
I see a mime type of "text/html", but the size of your response file
differs from mine so perhaps you server is doing some conditional
serving.
moo
------------------------------------------------------------
Ron Samuel Klatchko - Software Jester
Brightmail Inc - rsk@brightmail.com
Received on Wed Sep 22 17:39:32 1999