From: Thomas Dowling <tdowling(at)>
Date: Wed Jan 10 2007 - 14:46:48 GMT
On 1/10/2007 8:15 AM, James wrote:

> Good morning again!
> I have another question.  I am pretty sure someone on this discussion list
> has some knowledge about User-Agents (bots in particular) that seem to use
> Mozilla/2.0 or Mozilla/5.0.  For instance, Ask seems to use Mozilla/2.0 and
> Google seems to use Mozilla/5.0.  Do you know what this means?  Are they
> somehow running their spider through Mozilla?  Do they have Mozilla
> installed on their server to do this somehow?  Are there advantages to
> this?

Sometime last year, Google did indeed start using a Gecko-based page
renderer with googlebot, so it is in fact "Mozilla/5.0 (compatible...)".
 They did this to crack down on index spam that used CSS to hide from
browsers but showed up in googlebot's former Lynx-based crawler.

I don't know why Ask advertises as Mozilla/2.0, but there's a long, sad
history of sites refusing to send content if they decide your browser
isn't compatible, and all too often this decision is based on the
presence of the string "Mozilla/X.Y" in your User Agent header (though
usually such sites look for Mozilla/4.0 and above).

> ...I wondered if the Swish-e spider could be enhanced by doing this.
> For instance, maybe this would solve the UTF-8 issue?  Maybe it would solve
> other issues too.  I am speaking out of "ignorance" because this aspect of
> Google and Ask are not known to me.  Perhaps someone could help out in this
> area.  Bill?

If anything, that would be negotiated based on the Accept Charset
header, not the user agent string.  But I'm very doubtful that web
servers respond intelligently to Accept Charset.  And, of course, it
doesn't address any "SWISH-E + Local XML Files + UTF-8 = Oops" problems
(which is what I'm currently stuck on, f'rinstance).

Thomas Dowling
