Skip to main content.
home | support | download

Back to List Archive

Re: Identical Documents

From: Bill Moseley <moseley(at)>
Date: Thu Sep 16 2004 - 22:54:37 GMT
[I'm sending this back to the list]

On Thu, Sep 16, 2004 at 03:37:54PM -0700, Sebastian Jayaraj wrote:
> Hi Bill,
> Thanks for the quick response. It looks like the program does 
> the md5 filtering for only http like URL's.  In my case, I have the 
> documents residing on a windows server, samba mounted on a unix machine 
> on which I run the swish-e program using -S fs option to index them.
> Is there a way to run the on the local file system. The other 
> option (roundabout) I was thinking was to expose my source dirs on a 
> webserver and then run the spider to index them.

Well, if you were really lucky you might be able to "spider" locally
with file:///path/to/whatever/index.html -- but I've never tried that.

If you don't mind a tiny bit of Perl programming you could use the program and do your own MD5 checking in that program.

Can you just list what files need to be skipped with FileRules?

Bill Moseley
Received on Thu Sep 16 15:54:50 2004