Skip to main content.
home | support | download

Back to List Archive

Re: running out of memory during merge

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Thu Dec 06 2001 - 07:36:22 GMT
At 10:15 PM 12/05/01 -0800, Benjamin Grosser wrote:
>I'm getting the following message when trying to merge index files:
>
> swish: Ran out of memory (could not allocate enough)!

You may be out of luck.

Maybe Jose will have some comments (or correct anything that I say
incorrectly).

First, merging isn't just a matter of copying two files together.  It
basically  goes through the entire indexing process again.

Jose and Bill Meier and others put in a lot of work optimizing the indexing
process.  I can now index on my machine in about three minutes what once
took five hours ;)  (ok, so a machine swapping to death isn't a good
comparison.)

But the optimizations are really focused on file-by-file indexing.  Jose
does a lot of compression after processing each file while indexing.  None
of that is done while merging.  Basically, I think, merging falls back
something like swish 1.3 memory requirements.

I can index about 25K files on my machine in about 70M.  If I try to merge
that index with another index with only one file, I run out of memory.

You have two options at this time.

Index everything at once, or, if that's not reasonable, specify multiple
index files to swish with the -f switch.

The disadvantage of using multiple index files, IIRC, is only that sorting
very large result sets is slower, and requires more memory.  Sorting with
single indexes uses the "pre-sorted" tables making sorting faster.

There may be some way to use better compression during merge, but I think
it will be a while before that gets any attention.

>However, now it seems as if the merge is dying regardless of how much is
>free.  It seems to run right up to using about 530MB of physical RAM and
>then quits.  I freed up to about 650MB and it still quits after about
>530MB (hard to tell--I'm watching top so am not getting precise figures). 

Could you also be running up against a ulimit setting?

Try running with multiple indexes using -f and see how that works.


Bill Moseley
mailto:moseley@hank.org
Received on Thu Dec 6 07:36:55 2001