On Mon, May 03, 2004 at 11:02:57AM -0700, Weir James K Contr ASC/ENOI wrote:
> > Pushing the limits with 3 million files, I suppose. How long
> > does it take to index?
> It about 3 days to the index
Are you running out of RAM? I assume you are using -e when indexing.
Swish-e uses a bunch of hash-based stores, and they are not scalable.
Indexing large file sets without using -e you can really see indexing
slow down over time.
I once wrote a small program to generate random documents using words from a
dictionary. The program generated progress reports on the number of
words per minute indexed. Without -e indexing really slowed down at
about a million documents. Using -e started out slower but didn't slow
down as much. IIRC, it took about an hour or so to index a million
of those simple "documents".
Anyway, three days is not acceptable amount of time for indexing. If
it's not something obvious (like running out of RAM) then you might want
to look into other indexing systems that are designed for larger
indexing jobs.
--
Bill Moseley
moseley@hank.org
Received on Mon May 3 11:21:46 2004