On 1/10/07, Bill Moseley <email@example.com> wrote:
> On Wed, Jan 10, 2007 at 06:48:24AM -0800, James wrote:
> > Thanks, I'll check out the link. By the way, I did take the time to
> > read up on UTF-8 and user agents. But since there is a plethora of
> > information and since you guys are the experts, I am asking you
> > because I figure that you will be able to speed up my learning and/or
> > point me to some information that you are already aware of. That's
> > why novices seek out help from experts in forums and discussion
> > groups, right? Believe me, I have spent hours and hours already,
> > before even posting, trying to find useful information, even on
> > Mozilla's own site.
> Yes, it's a real time killer trying to learn this stuff. Seems like
> that's a big chunk of my day. Ignore my early morning sarcasm -- the
> list archives are full of it.
If that's an apology, I accept. :-) I do highly value the
information that you provide, Bill. I think you are a brilliant man.
> > > and in regard to the user agent question: I believe that one reason bots
> > > identify themselves as particular user agents is because they want to receive
> > > the same responses that the server would hand out to those non-bot agents.
> > So, is this a real benefit to the Swish-e Spider and how would it be
> > accomplished?
> Most people are indexing sites they run, so they know if their content
> looks at user agent. You might have a intranet with content that you
> don't control that checks the agent string, but you could see that
> when indexing (by having the spider tell you files rejected due to
> robots exclusion).
> Google spiders everyone so it fakes the UA for those misguided
> How would it be accomplished? You mean how to set the agent string?
> agent => "Mozilla/5.0 (compatible; Googlebot/2.1; http://www.google.com/bot.html)",
So, you think that it is a case of them just putting that information
into their user / server agent string to trick the server into
thinking they are viewing through a browser? That's an interesting
thought. So, maybe they aren't running any spiders through browsers.
Or is it still possible that they do run the spider through a browser?
> Bill Moseley
Thanks for your time, much appreciated,
Received on Wed Jan 10 07:14:08 2007