Skip to main content.
home | support | download

Back to List Archive

Re: Disallow in Robots.txt

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Mon Jan 08 2007 - 14:46:54 GMT
On Mon, Jan 08, 2007 at 05:51:33AM -0800, James wrote:
> Thanks, Brad!  Yes, I had seen those lines in SwishSpiderConfig.pl before.
> I am also wondering where the "2.2" is being generated from (that I see in
> the access logs).  I always see swish-e spider 2.2 http://swish-e.org.
> ..I'll be curious to get Bill's response to this, to confirm.  I am not
> confident that this is the total answer, since I always see a whole lot
> written in the access logs from Yahoo, MSN and Google and yet their
> UserAgent is just a one word (short) term to exclude (like the psbot) in the
> Robots.txt.  So, it seems there is more to this.

Good point -- that agent string has not been updated in quite a while.

I guess I would have just tried it and see what happens.  Or look at
the source:

http://search.cpan.org/src/GAAS/libwww-perl-5.805/lib/WWW/RobotRules.pm

A quick look around shows:

robots.txt is parsed as:

        elsif (/^\s*User-Agent\s*:\s*(.*)/i) {
	    $ua = $1;
	    $ua =~ s/\s+$//;


#
# Returns TRUE if the given name matches the
# name of this robot
#
sub is_me {
    my($self, $ua_line) = @_;
    my $me = $self->agent;

    # See whether my short-name is a substring of the
    #  "User-Agent: ..." line that we were passed:
    
    if(index(lc($me), lc($ua_line)) >= 0) {
      LWP::Debug::debug("\"$ua_line\" applies to \"$me\"")
       if defined &LWP::Debug::debug;
      return 1;
    }
    else {
      LWP::Debug::debug("\"$ua_line\" does not apply to \"$me\"")
       if defined &LWP::Debug::debug;
      return '';
    }
}


-- 
Bill Moseley
moseley@hank.org

Unsubscribe from or help with the swish-e list: 
   http://swish-e.org/Discussion/

Help with Swish-e:
   http://swish-e.org/current/docs
   swish-e@sunsite.berkeley.edu
Received on Mon Jan 8 06:46:55 2007