Skip to main content.
home | support | download

Back to List Archive

Re: duplicate results

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Mon Dec 02 2002 - 14:04:36 GMT
On Mon, 2 Dec 2002 tom12@bluemail.ch wrote:

> Hello
> 
> Is there a possibility in swish-e to eliminate repeated urls when indexing
> or searching? So when searching with the keyword Baghdad it will return
> www.cnn.com and http://www.cnn.com/2002/WORLD/meast/12/02/sproject.irq.inspectors/index.html.
> So it would be nice when I would recieve only the last one. How have I to
> do that? Unfortunately, I wasn't able to find any information in the documentation.

Why do you say those are duplicate URLs?

During spidering (with -S prog and spider.pl) you can use MD5 checksums
to avoid duplicate content.  You can also simply reject some URLs from
indexing in the spider config file.


-- 
Bill Moseley moseley@hank.org
Received on Mon Dec 2 14:06:25 2002