Skip to main content.
home | support | download

Back to List Archive

Re: Question on indexing time

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Thu Aug 02 2001 - 21:50:24 GMT
At 10:27 AM 08/02/01 -0700, Rick McGowan wrote:
>But the indexing operation doesn't seem to ever complete... It takes forever.
>
>I have 20,000 small files of about 1k bytes each; e-mail archives.  On an  
>800MHz Pentium III with 128MB of main memory (recent Linux OS)... is it  
>reasonable for the initial indexing of this dataset to take over 48 hours?   

No that's not reasonable.  What exact version are you running?  I would
expect that to index in just a few minutes in 2.1-dev.  I can index 30,000
/usr/doc (much larger) files in about 15 minutes on my machine with 128M --
and that's swapping.

In general, memory is the real problem with swish.  128M is not that much.
But should be enough for your situation.

2.1-dev might be good for you, especially if you are indexing mail
archives.  You can write a perl script to extract out the From:, To:,
Subject:, Date: and  feed that data to swish for indexing.  Then you can
limit searches to those fields.

For fun, try indexing again with 2.1-dev.
http://sunsite.berekely.edu:4444/swish-daily/



Bill Moseley
mailto:moseley@hank.org
Received on Thu Aug 2 21:51:32 2001