Thanks Bill!
--- Bill Moseley <moseley@hank.org> wrote:
> On Mon, Oct 31, 2005 at 06:49:46AM -0800, J Robinson
> wrote:
> > The actual complaint is that the spider is
> indexing
> > pages it shouldn't.
>
> Right -- I had this complaint once and it turned out
> to be a syntax
> error in the robots.txt file.
>
>
> > I'll check out the 'skipped' debug flag -- is
> there
> > another that actually shows urls being compared
> > against the robots.txt contents?
>
> the spider just uses LWP::RobotUA which uses
> WWW::RobotRules. Those
> are widely used so should work as expected.
>
> Try setting in spider:
>
> use LWP::Debug 'debug+';
>
> although you might get more info that you want if
> spidering a lot of
> file. I typically just hack away at the module and
> throw in prints to
> see what's happening.
>
> --
> Bill Moseley
> moseley@hank.org
>
> Unsubscribe from or help with the swish-e list:
> http://swish-e.org/Discussion/
>
> Help with Swish-e:
> http://swish-e.org/current/docs
> swish-e@sunsite.berkeley.edu
>
>
__________________________________
Yahoo! Mail - PC Magazine Editors' Choice 2005
http://mail.yahoo.com
Received on Mon Oct 31 07:51:49 2005