Skip to main content.
home | support | download

Back to List Archive

Re: Indexing performances, multi millions words

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Mon Dec 31 2001 - 01:50:55 GMT
At 07:37 AM 12/28/01 -0800, Jean-François PIÉRONNE wrote:
>As you can see there is a big win to increase the *HASHSIZE parameters
>
>So, IMHO, it would be better to default the three HASHSIZE using the
following
>setting
>HASHSIZE 1009
>BIGHASHSIZE 10001
>SEARCHHASHSIZE 100003

I don't have a lot of data to work with, but here's my test with the
different settings:

Old hash sizes:

  PID USERNAME     PRI NICE  SIZE    RES STATE  C   TIME   WCPU    CPU COMMAND
16203 moseley       64   0   324M   323M CPU1   1   9:43 97.95% 97.95% swish-e

682368 unique words indexed.
2 properties sorted.                                              
38840 files indexed.  457923344 total bytes.  19964931 total words.
Elapsed time: 00:12:47 CPU time: 00:09:45
Indexing done!

(Sure would be nice to have some comma's in those numbers...)

-rw-r--r--  1 moseley  moseley  70685718 Dec 30 17:19 index.swish-e
-rw-r--r--  1 moseley  moseley  44974080 Dec 30 17:19 index.swihs-e.prop


Now, using your suggested values:


  PID USERNAME     PRI NICE  SIZE    RES STATE  C   TIME   WCPU    CPU COMMAND
18774 moseley       64   0   325M   325M CPU0   1   7:44 96.29% 96.29% swish-e
682368 unique words indexed.
2 properties sorted.                                              
38840 files indexed.  457923344 total bytes.  19964931 total words.
Elapsed time: 00:10:18 CPU time: 00:07:44
Indexing done!

-rw-r--r--  1 moseley  moseley  71045726 Dec 30 17:32 index.swish-e
-rw-r--r--  1 moseley  moseley  44974080 Dec 30 17:32 index.swish-e.prop


Hard to really measure with just one run each on a busy machine and such a
small amount of data, but something significant.

I've just now committed the changes to cvs.  It will be interesting to see
if anyone else notices any improvement.

Thanks for the help!



-- 
Bill Moseley
mailto:moseley@hank.org
Received on Mon Dec 31 01:51:14 2001