Skip to main content.
home | support | download

Back to List Archive

Re: Incremental Mode

From: <redna(at)not-real.euskalerria.org>
Date: Mon Feb 23 2004 - 08:34:20 GMT
>On Fri, Feb 20, 2004 at 07:01:06AM -0800, Ander wrote:
>> I've been using incremental indexing mode, and I didn't had any problem.
It 
>> works fine on my machines.
>> 
>> It bases on files modification date for local files, but I'm not sure it
recognises
>> modified "remote" files (web files).
>
>Not that's not exactly correct.
>
>-u only adds files to an existing index.  I suspect if you did:
>
>  swish-e -i foo.html     # creates new index
>  swish-e -u -i foo.html  # add foo.html to existing index
>
>that foo.html would be in the index twice.

We could catch the content or calculate a MD5 checksum to control which are
the "modified" files, isn't it right?

We can create an BerkeleyDB hash to store the checksums of dinamic web
pages we want to index, and use the filter_content feature of spider.pl to
decide which are the files to index.

What do you think?

>To use incremental indexing you have to build swish-e differently (with
>--enable-incremental option).  This uses a different index format (Btree
>instead of a hash-based index) and is not compatible with other (non
>Btree) indexes.
>
>There's currently no way to update an index (i.e. say if an existing
>file is updated).  This type of incremental indexing might be useful for
>something like a mailing list where the index just gets added to (old
>messages don't changed).
>
>You can use other methods (like -D) to only pass to swish-e the new
>files to be added to an existing index.
>
>-- 
>Bill Moseley
>moseley@hank.org
>
>
_________________________________________________________
Txat euskalduna >>> http://www.euskalerria.org/solasgunea
Received on Mon Feb 23 00:34:31 2004