Skip to main content.
home | support | download

Back to List Archive

Re: robots.txt

From: J Robinson <jrobinson852(at)not-real.yahoo.com>
Date: Mon Oct 31 2005 - 15:51:48 GMT
Thanks Bill!

--- Bill Moseley <moseley@hank.org> wrote:

> On Mon, Oct 31, 2005 at 06:49:46AM -0800, J Robinson
> wrote:
> > The actual complaint is that the spider is
> indexing
> > pages it shouldn't.
> 
> Right -- I had this complaint once and it turned out
> to be a syntax
> error in the robots.txt file.
> 
> 
> > I'll check out the 'skipped' debug flag -- is
> there
> > another that actually shows urls being compared
> > against the robots.txt contents?
> 
> the spider just uses LWP::RobotUA which uses
> WWW::RobotRules.  Those
> are widely used so should work as expected.
> 
> Try setting in spider:
> 
>     use LWP::Debug 'debug+';
> 
> although you might get more info that you want if
> spidering a lot of
> file.  I typically just hack away at the module and
> throw in prints to
> see what's happening.
> 
> -- 
> Bill Moseley
> moseley@hank.org
> 
> Unsubscribe from or help with the swish-e list: 
>    http://swish-e.org/Discussion/
> 
> Help with Swish-e:
>    http://swish-e.org/current/docs
>    swish-e@sunsite.berkeley.edu
> 
> 



	
		
__________________________________ 
Yahoo! Mail - PC Magazine Editors' Choice 2005 
http://mail.yahoo.com
Received on Mon Oct 31 07:51:49 2005