At 02:53 AM 6/12/2002 -0700, Cristiano Corsani wrote:
>Hi all,
>
>I wite just to tell that swish-e works with my big DB.
Cool.
>1111343 files indexed. 2026462918 total bytes. 92327081 total words.
>Elapsed time: 16:27:15 CPU time: 16:27:15
16 hours!
Average sized of doc is about 1,823 bytes.
>on a Pentium IV with 250Mb RAM.
On my machine I my Athlon 1800+ with 1/2G I can index about 24,000 files in
a minute. Less than an hour for a million. On my PIII-550 it takes about
4 minutes. So that's about 3 hours to do a million files.
My guess is you are running out of memory while indexing. Did you index
with the -e switch? It will keep your disk drive busy, but will save RAM.
Better to let swish swap than the OS. Best to use a machine with more RAM.
How does one monitor memory usage on Windows?
So it says: 2,778,708 unique words indexed.
That's a lot of words to index. Will people be searching all those words?
Trim that number down and you will save memory.
Make sure you are *not* indexing a unique record identifier. No point
indexing something you can use to look up the item directly in a database.
Run swish-e -T index_words_only > word_list and then you can look at the
words indexed. You may see words that do not need to be index.
Also, you might search the archive using a Subject Only search for "multi
millions words" and also search for BIGHASHSIZE to look at possible tuning
you might be able to do.
Hope this helps.
Bill Moseley
mailto:moseley@hank.org
Received on Wed Jun 12 14:40:14 2002