Hi, Is there a way to algorithmically calculate the similarity between two chunks of html as some sort of index? Perhaps a float value between 0 and 1 where 1 is exactly the same and 0 is 100% different? I'm trying to remove very similar documents from our swish index. I'd really appreciate any help you can offer because I've been struggling with this for some time. Thanks, Mark.Received on Sat Feb 5 23:32:40 2005