Skip to main content.
home | support | download

Back to List Archive

long merge times

From: Michael <michael(at)>
Date: Tue Jul 16 2002 - 00:01:10 GMT
> On Mon, 15 Jul 2002, Michael wrote:
> > merges are slower than 2.05 -- hard to say quantitatively now. My 
> > guess is around 2:1 for files as above at least. I used to be able to 
> > merge the whole thing in a day or two but based on the benchmark of 
> > above, it would take 80-100 hrs or more.
> I really can not remember.  IIRC, merge in 2.1-dev works like it did
> in 2.05, but 2.1's indexing is much faster and uses less RAM.  Merge
> in 2.1 does not take advantage of all the compression features of
> normal indexing.  Hopefully that will be fixed sometime...
> The index format is different with 2.1, so that might be one reason
> it's slower than 2.05.
> You have two options over merging.  One is to index everything at
> once, and the other is to specify more than one index file on the
> command line when searching.
> And you can use -e if you are short on RAM.

Neither of those is particularly appealing. We have 4-5 years worth 
of data and accumulate new data daily. The indexes are broken down 
by directory per month so not merging would imply searching up to 50 
index files for a full search. Merging monthly is OK, but takes a 
LONGGGG time even for a one month add but since swish does not 
have an "exclude dir/file" switch, indexing the entire site minus the 
last two months directories is not doable, but is easily accomplished 
with a merge.

I tried a whole site merge just to measure the time, it took about 45 
minutes to create a 90meg index file in economy mode.
-- LESS THAN A MERGE for 3megs + 30 megs which takes 90 minutes. This 
does not seem right.

I'd like some suggestions...

Received on Tue Jul 16 00:04:40 2002