Hi all,
I am attempting to spider a site from one of our company's divisions
that uses asp pages. Like most ASP, it always redirects you when you
access it.
..
/usr/bin/time /usr/local/bin/swish-e-2.5.2 -c ./conf/site.conf -S http
-i http://www.site.com/ -v 3
Parsing config file './conf/site.conf'
Indexing Data Source: "HTTP-Crawler"
Indexing "http://www.site.com/"
Now fetching [http://www.site.com/robots.txt]...Status: 404.
retrieving http://www.site.com/ (0)...
Now fetching [http://www.site.com/]...Status: 302. /Default.asp?c=1298
Skipping /Default.asp?c=1298: Wrong method or server.
..
The problem seems to be that the 302 response only contains the relative
URI. A correctly formed 302 response has an absolute URI - ie it should be
302. http://www.site.com/Default.asp?c=1298
instead of just
302. /Default.asp?c=1298
I tried this on another asp site with the same problem, so maybe it is
an endemic problem on asp servers? no idea
But most other programs - wget for example - seem to handle this type of
302 response anyway.
Can anyone confirm that swish-e does in fact have this problem, and if
so, maybe swish-e should consider honouring these sloppy redirects?
And can anyone think of a workaround?
Thanks very much,
Francis
--
-----------------------------------
Francis Vierboom
francis@galexia.com
Research Consultant
Galexia Consulting Pty Ltd
Suite 95, Jones Bay Wharf,
(Lower Deck, East Side)
26-32 Pirrama Road,
Pyrmont NSW 2009,
Australia
tel: +61 (0)2 9660 1111
fax: +61 (0)2 9660 7611
Received on Sun Jul 31 19:53:37 2005