Skip to main content.
home | support | download

Back to List Archive

Re: User-Agent: Mozilla/2.0 and Mozilla/5.0 (compatible

From: James <swish.enhanced(at)not-real.gmail.com>
Date: Wed Jan 10 2007 - 15:14:08 GMT
On 1/10/07, Bill Moseley <moseley@hank.org> wrote:
> On Wed, Jan 10, 2007 at 06:48:24AM -0800, James wrote:
> > Thanks, I'll check out the link.  By the way, I did take the time to
> > read up on UTF-8 and user agents.  But since there is a plethora of
> > information and since you guys are the experts, I am asking you
> > because I figure that you will be able to speed up my learning and/or
> > point me to some information that you are already aware of.  That's
> > why novices seek out help from experts in forums and discussion
> > groups, right?  Believe me, I have spent hours and hours already,
> > before even posting, trying to find useful information, even on
> > Mozilla's own site.
>
> Yes, it's a real time killer trying to learn this stuff.  Seems like
> that's a big chunk of my day.  Ignore my early morning sarcasm -- the
> list archives are full of it.

If that's an apology, I accept.  :-)  I do highly value the
information that you provide, Bill.  I think you are a brilliant man.

> > > and in regard to the user agent question: I believe that one reason bots
> > > identify themselves as particular user agents is because they want to receive
> > > the same responses that the server would hand out to those non-bot agents.
> >
> > So, is this a real benefit to the Swish-e Spider and how would it be
> > accomplished?
>
> Most people are indexing sites they run, so they know if their content
> looks at user agent.  You might have a intranet with content that you
> don't control that checks the agent string, but you could see that
> when indexing (by having the spider tell you files rejected due to
> robots exclusion).

Good points.

>
> Google spiders everyone so it fakes the UA for those misguided
> programmers.
>
> How would it be accomplished?  You mean how to set the agent string?
>
>     agent => "Mozilla/5.0 (compatible; Googlebot/2.1; http://www.google.com/bot.html)",

So, you think that it is a case of them just putting that information
into their user / server agent string to trick the server into
thinking they are viewing through a browser?  That's an interesting
thought.  So, maybe they aren't running any spiders through browsers.
Or is it still possible that they do run the spider through a browser?

>
> --
> Bill Moseley

Thanks for your time, much appreciated,

James
Received on Wed Jan 10 07:14:08 2007