Skip to main content.
home | support | download

Back to List Archive

Re: SWISH-E index limits

From: Gerald Klaas <gklaas(at)>
Date: Mon Apr 22 2002 - 17:50:51 GMT
Bill Moseley wrote:
> At 10:03 AM 04/22/02 -0700, Linda DeBoer wrote:
> >       Whenever I run swish-e against a site which has a url pointing back
> >to the home page, it loops.
> You don't mean "loop" in that it indexes the same URL more than once, right?

It might if there is an equivalent URL not configured with the
EquivalentServer directive.  I.e. and
are two URL's for the same page. So wouldn't you need () in your config file ?

Or if the links back to the homepage, are not consistent, you might
also wind up with things like () being indexed separately.
And then possibilities of case insensitivity if the host is MS-based

> But, if you are using 2.1-dev, and the -S prog method with then
> it's rather easy to do this.
> In the config you can say:
>   test_url => sub {
>       my $uri = shift;
>       return $uri->path =~ m!^/some/path!;
>   }

I do this. Just like Bill says, it works like a charm.   :-)
If you want to see how I use this, you can check the 
"spider configuration template" link from

> Another option, which would be fast, would be to run another web
> server/virtual host on a different port, and change the document root.
Interesting.  Then you'd use the ReplaceRules directive to
rewrite the URL as it goes into the index? 

Received on Mon Apr 22 17:50:56 2002