At 09:04 AM 04/26/02 -0700, GUEGAN Ronald wrote:
>Is there a way to detect that an HTML file as already been indexed ?
>We are indexing websites where a file can be accessed in various way :
> - http://www.mysite.com/app1/page.asp?param=1&other=0
> - http://www.mysite.com/app1/page.asp?param=1
>In the given example, both url could point to the same page.
If you are using (the soon to be a prelease) 2.1-dev version with -S prog
and spider.pl then yes, you can. That spider has a MD5 option to
fingerprint each page, so that should catch duplicates.
We discussed this just a few days ago, so you might check the list
Received on Fri Apr 26 16:27:36 2002