Skip to main content.
home | support | download

Back to List Archive

Re: Merging indexes

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Mon Jul 01 2002 - 21:41:58 GMT
At 11:06 AM 07/01/02 -0700, CBol wrote:
>
>> How about for now when searching you do:
>>
>>     ./swish-e -w $query -f index1 index2 index3 ...
>
> It's what I'm doing. My concern is what may happen over time, when I feed
> more and more files to the search engine, one more index each month.
> OK, I may reindex all from time to time, but I will be more a more elegant
> solution if I can sum the indexes.

Incremental indexing is a problem.  There's work toward incremental
indexing but it will be a while before it's available. Swish is very fast
at indexing so if you are not indexing hundreds of thousands of files then
reindexing typically isn't a huge issue.

You might want to look at htdig, as I think it does incremental indexing.  


>> Merge does not work well in 2.1-dev version, and is a current topic of the
>> developers.  It uses way too much memory.
>
> What then? Memory is very inexpensive today. My index is 15 MBytes in size,
> well below the memory I have available in my machine (256M).

You would have to try and see how it goes.


>> 17 hours is a long time for indexing.  How many files were you indexing?
>
>Circa 10000, and I 'm not enough fluent in Perl to edit the scripts and use
> the prog feature. ;-(

Do you have a delay set in the swish-e config?  I think I can index 10,000
files in less time: ;)

10000 files indexed.  19646749 total bytes.  2031037 total words.
Elapsed time: 00:00:20 CPU time: 00:00:13
Indexing done!

My advice would be see if you can fetch the files faster, or better yet,
cache them compressed locally.  Of course, what's 17 hours -- just start it
and let it run.




-- 
Bill Moseley
mailto:moseley@hank.org
Received on Mon Jul 1 21:45:30 2002