Skip to main content.
home | support | download

Back to List Archive

Re: SWISH-E index limits

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Mon Apr 22 2002 - 17:24:34 GMT
At 10:03 AM 04/22/02 -0700, Linda DeBoer wrote:
>	Whenever I run swish-e against a site which has a url pointing back
>to the home page, it loops.

You don't mean "loop" in that it indexes the same URL more than once, right?

I don't know how to make -S http method do that.  Any robots.txt tricks?

But, if you are using 2.1-dev, and the -S prog method with spider.pl then
it's rather easy to do this.

In the config you can say:

  test_url => sub {
      my $uri = shift;
      return $uri->path =~ m!^/some/path!;
  }

Which just says that all paths must begin with /some/path/*

Another option, which would be fast, would be to run another web
server/virtual host on a different port, and change the document root.


-- 
Bill Moseley
mailto:moseley@hank.org
Received on Mon Apr 22 17:24:39 2002