Skip to main content.
home | support | download

Back to List Archive

Re: [swish-e] indexing performance expectations

From: Peter Finch <PFinch(at)not-real.cch.com.au>
Date: Sun Jul 13 2008 - 22:43:01 GMT
Hi Patrick,

 

We have a similar problem; we have about 900,000+ documents at over

4GB.  

 

Fortunately for me the documents are grouped into directories and I 

only reindex the groups that change into a "intermediary" index (I
actually 

use a Makefile to detect which directories were updated). Then I merge 

all the intermediary indexes into the final index. It still takes a 

while (~1 hour on a sparc V210) but it's faster than doing it all from 

scratch. 

 

On average it's faster to merge, however, if everything changes then it 

actually takes longer... fortunately, that does not happen very often. 

 

Also, be careful in the number of "intermediary" indexes as Swish can

only merge a few dozen at once. 

 

I hope this helps.

 

Regards,

Peter Finch

 

________________________________

From: users-bounces@lists.swish-e.org
[mailto:users-bounces@lists.swish-e.org] On Behalf Of Patrick May
Sent: Saturday, 12 July 2008 12:26 AM
To: users@lists.swish-e.org
Subject: [swish-e] indexing performance expectations

 

Hello,

How should I expect indexing to perform when indexing 900,000+ very
small documents (256 Mb)?  Thus far, my observation is that it takes a
while.  Could it be helpful to move to an incremental format?

Cheers,

~ p


-- 
Patrick May
135 Oak Street
New York, NY 11222
+1 (347) 232-5208
patrick@hexane.org
http://www.hexane.org



_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Sun Jul 13 18:43:07 2008