Skip to main content.
home | support | download

Back to List Archive

Re: Swish-e indexing the same file multiple times

From: <moseley(at)>
Date: Fri May 16 2003 - 11:35:04 GMT
On Fri, May 16, 2003 at 02:26:06AM -0700, A.Little wrote:

> I'm in the process of setting up swish-e to index various websites and I've
> come across a little problem in that swish-e will index
> and as 2
> seperate pages, even when index.html is the default page for

There a few ways to do that if you are using the -S prog "" 

One is to enable MD5 checking -- that will avoid indexing duplicate content.

The other way would be to use a "test_url" function to add "index.html" to 
all URL's ending in "/".

    test_url => sub { 
        my $uri = shift;
        my $path = $uri->path;
        $uri->path( $path . "index.html" )
            if $path =~ m[/$];
I didn't test that just now, so you might need to tweak.  You don't want to 
do that if it's possible the index file name has been changed.

Bill Moseley
Received on Fri May 16 11:35:11 2003