I use the file system mode rather than spidering. The problem is with
multiple indexes. I index each day like:
2006/04/01/articles/
2006/04/02/articles/
.
.
etc.
For each day, there is a corresponding index, like:
20060401.swish-e
20060402.swish-e
I then use a search against *.swish-e. Duplication occurs when an
article exists for more than one day - thus I use a Berkley DB file for
keeping track of checksums between days.
>
> are the URLs you are passing to swish-e unique?
>
> Patrick O'Lone scribbled on 4/26/06 8:54 AM:
>> Hello,
>>
>> I've been using swish-e for sometime now. I think it's a great
>> product, but I've had to use a special hack to avoid heavy
>> duplication issues within the index. I use MD5 checksums in an
>> external Berkley DB file for maintaining uniqueness within a
>> collection of documents - I was wondering if there is a better way.
>> Is it possible to have a unique key in a swish-e index file or would
>> that require the incremental mode feature? Also, will version 2.4.4
>> be coming out soon or is it on hold indefinitely? Thanks for any
>> feedback!
>>
>
--
Patrick O'Lone
Software Project Manager
TownNews.com
E-mail ... polone@townnews.com
Phone .... 309-743-0809
Fax ...... 309-743-0830
Received on Thu Apr 27 08:27:31 2006