On Mon, Jan 08, 2007 at 05:51:33AM -0800, James wrote:
> Thanks, Brad! Yes, I had seen those lines in SwishSpiderConfig.pl before.
> I am also wondering where the "2.2" is being generated from (that I see in
> the access logs). I always see swish-e spider 2.2 http://swish-e.org.
> ..I'll be curious to get Bill's response to this, to confirm. I am not
> confident that this is the total answer, since I always see a whole lot
> written in the access logs from Yahoo, MSN and Google and yet their
> UserAgent is just a one word (short) term to exclude (like the psbot) in the
> Robots.txt. So, it seems there is more to this.
Good point -- that agent string has not been updated in quite a while.
I guess I would have just tried it and see what happens. Or look at
the source:
http://search.cpan.org/src/GAAS/libwww-perl-5.805/lib/WWW/RobotRules.pm
A quick look around shows:
robots.txt is parsed as:
elsif (/^\s*User-Agent\s*:\s*(.*)/i) {
$ua = $1;
$ua =~ s/\s+$//;
#
# Returns TRUE if the given name matches the
# name of this robot
#
sub is_me {
my($self, $ua_line) = @_;
my $me = $self->agent;
# See whether my short-name is a substring of the
# "User-Agent: ..." line that we were passed:
if(index(lc($me), lc($ua_line)) >= 0) {
LWP::Debug::debug("\"$ua_line\" applies to \"$me\"")
if defined &LWP::Debug::debug;
return 1;
}
else {
LWP::Debug::debug("\"$ua_line\" does not apply to \"$me\"")
if defined &LWP::Debug::debug;
return '';
}
}
--
Bill Moseley
moseley@hank.org
Unsubscribe from or help with the swish-e list:
http://swish-e.org/Discussion/
Help with Swish-e:
http://swish-e.org/current/docs
swish-e@sunsite.berkeley.edu
Received on Mon Jan 8 06:46:55 2007