On Wed, Aug 16, 2006 at 12:45:31PM -0700, Z wrote:
> >From a command propt I tried this:
>
> >spider.pl default http://www.swish-e.org/ > output.txt
>
> The result was:
> E:\INETPUB\WWWROOT\SITE\WINDOWS\spider.pl: Reading parameters from 'default'
> Summary for: http://www.swish-e.org/
> Connection: Close: 1 (0.0/sec)
> Unique URLs: 1 (0.0/sec)
did you read about debugging?
$ perldoc /usr/local/lib/swish-e/spider.pl | grep DEBUG
DEBUG_SKIPPED debug flag is set.
DER_DEBUG when running spider.pl. You can specify any of the above
SPIDER_DEBUG=url,links spider.pl [....]
debug => DEBUG_URL | DEBUG_FAILED | DEBUG_SKIPPED,
DEBUG_* constants. The string is converted to a number only at the
Now you can use the words instead of or'ing the DEBUG_* constants
This is for a file forbidden by robots.txt:
$ SPIDER_DEBUG=failed,url,links,headers perl /usr/local/lib/swish-e/spider.pl default http://www.swish-e.org/who.html
/usr/local/lib/swish-e/spider.pl: Reading parameters from 'default'
-- Starting to spider: http://www.swish-e.org/who.html --
vvvvvvvvvvvvvvvv HEADERS for http://www.swish-e.org/who.html vvvvvvvvvvvvvvvvvvvvv
---- Request ------
HEAD http://www.swish-e.org/who.html
Accept-Encoding: gzip; deflate
---- Response ---
Status: 403 Forbidden by robots.txt
^^^^^^^^^^^^^^^ END HEADERS ^^^^^^^^^^^^^^^^^^^^^^^^^^
Summary for: http://www.swish-e.org/who.html
Connection: Close: 1 (1.0/sec)
Unique URLs: 1 (1.0/sec)
robots.txt: 1 (1.0/sec)
--
Bill Moseley
moseley@hank.org
Unsubscribe from or help with the swish-e list:
http://swish-e.org/Discussion/
Help with Swish-e:
http://swish-e.org/current/docs
swish-e@sunsite.berkeley.edu
Received on Wed Aug 16 12:53:07 2006