Indexing performances, multi millions words

From: Jean-François PIÉRONNE <jfp(at)>
Date: Wed Dec 26 2001 - 10:54:29 GMT
Hi all,

indexing large documents (more than 11000 files, 1.2 Go and near 4.5 M words), i
have noticed that the indexing times can be heavily reduced when i increased the

I don't know which of the three is the most significant, but indexing time drop
from 6 hours to less than 2 hours, and these 2 hours are mostly CPU bound.

May be, these parameters can dynamics (configuration parameters) or have larger

After this, most of the times (80-90 %) is spent in the phase "writing word
data" doing a lot of CPU and millions reads in  the temporary file build during
the parsing-collecting pass.
I haven't isolate which routines is costly.

