Skip to main content.
home | support | download

Back to List Archive

Re: [swish-e] Indexing starts all over again

From: Peter Karman <peter(at)not-real.peknet.com>
Date: Tue Aug 18 2009 - 02:10:34 GMT
Paras Fadte wrote on 8/14/09 12:00 AM:
> Tried it but doesn't seem to work.
> 
> On Wed, Aug 12, 2009 at 7:04 PM, Peter Karman<peter@peknet.com> wrote:
>> Paras Fadte wrote on 08/11/2009 04:38 AM:
>>> Hi,
>>>
>>> I have had strange problem while indexing with swish-e wherein it
>>> appears to start indexing data all over again as if it is in some
>>> loop. When I try with say max_depth=1 or 2 it works fine . Can anybody
>>> point out what could be happening here ?
>>>
>> Sounds like the spider.pl (I assume you are using that) is not
>> identifying URLs as duplicates. You could try turning on the md5 option
>> as described in the documentation:
>> http://swish-e.org/docs/spider.html#duplicate_documents
>>
>> Search for 'use_md5' in the docs and make sure you have the Digest::MD5
>> perl module installed from CPAN.


without more details and/or a reproducible test case, we'll just be guessing.

-- 
Peter Karman  .  http://peknet.com/  .  peter(at)not-real.peknet.com
gpg key: 37D2 DAA6 3A13 D415 4295  3A69 448F E556 374A 34D9
_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Mon Aug 17 22:10:36 2009