Skip to main content.
home | support | download

Back to List Archive

Re: Disallow in Robots.txt

From: James <swish.enhanced(at)not-real.gmail.com>
Date: Tue Jan 09 2007 - 10:22:46 GMT
Brad,

It appears that you were right, in part, after all.

changed: $server->{agent} ||= 'swish-e spider 2.2 http://swish-e.org/';
to: $server->{agent} ||= 'MyTitle http://mywebsite.com/';

changed: $ua->agent( "SwishSpider http://swish-e.org" );
To: $ua->agent( "MyTitle" );

It took a bit of poking around, but I think I got it.  I'll begin testing on
it to see that it actually works as expected.

Thanks,

James

On 1/8/07, Brad Miele wrote:
>
> Pretty sure you can set agent in SwishSpiderConfig.pl, yep, line 143:
>
> agent       => 'swish-e spider http://swish-e.org/'
>
> regards,
>
> Brad
> ---------------------
> Brad Miele
> VP Technology
> IPNStock.com
> 866 476 7862 x902
> bmiele@ipnstock.com
>
> On Mon, 8 Jan 2007, James wrote:
>
> > Is there a way for other web-masters to disallow Swish-e from crawling
> their
> > site(s) and is there a way to declare what bot I am?  For instance, I
> always
> > put the following in my robots.txt files for my web-sites:
> >
> > User-agent: psbot
> > Disallow: /
> >
> > Is there some kind of configuration file that declares what bot
> (User-agent)
> > I am (when using Swish-e) and can that be changed to something I
> customize
> > and something I can declare publicly so that anyone can disallow my user
> > agent?
> >
> > I ask these things in general because I know that Swish-e has a polite
> > spider, obeying Robots.txt and noindex, nofollow directives.
> >
>



*********************************************************************
Due to deletion of content types excluded from this list by policy,
this multipart message was reduced to a single part, and from there
to a plain text message.
*********************************************************************
Received on Tue Jan 9 02:22:52 2007