Re: [swish-e] indexing performance expectations

From: Peter Finch <PFinch(at)>
Date: Sun Jul 13 2008 - 22:43:01 GMT
Hi Patrick,


We have a similar problem; we have about 900,000+ documents at over



Fortunately for me the documents are grouped into directories and I 

only reindex the groups that change into a "intermediary" index (I

use a Makefile to detect which directories were updated). Then I merge 

all the intermediary indexes into the final index. It still takes a 

while (~1 hour on a sparc V210) but it's faster than doing it all from 



On average it's faster to merge, however, if everything changes then it 

actually takes longer... fortunately, that does not happen very often. 


Also, be careful in the number of "intermediary" indexes as Swish can

only merge a few dozen at once. 


I hope this helps.



Peter Finch



[] On Behalf Of Patrick May
Sent: Saturday, 12 July 2008 12:26 AM
Subject: [swish-e] indexing performance expectations



How should I expect indexing to perform when indexing 900,000+ very
small documents (256 Mb)?  Thus far, my observation is that it takes a
while.  Could it be helpful to move to an incremental format?


~ p

Patrick May
135 Oak Street
New York, NY 11222
+1 (347) 232-5208

