On Fri, May 16, 2003 at 02:26:06AM -0700, A.Little wrote:
> I'm in the process of setting up swish-e to index various websites and I've
> come across a little problem in that swish-e will index
> http://www.mydomain.com/ and http://www.mydomain.com/index.html as 2
> seperate pages, even when index.html is the default page for
> www.mydomain.com.
There a few ways to do that if you are using the -S prog "spider.pl"
program.
One is to enable MD5 checking -- that will avoid indexing duplicate content.
The other way would be to use a "test_url" function to add "index.html" to
all URL's ending in "/".
test_url => sub {
my $uri = shift;
my $path = $uri->path;
$uri->path( $path . "index.html" )
if $path =~ m[/$];
}
I didn't test that just now, so you might need to tweak. You don't want to
do that if it's possible the index file name has been changed.
--
Bill Moseley
moseley@hank.org
Received on Fri May 16 11:35:11 2003