Skip to main content.
home | support | download

Back to List Archive

Re: More on indexing and memory requirements in

From: Frank Heasley <DrHeasley(at)not-real.chemistry.com>
Date: Thu Aug 31 2000 - 15:28:35 GMT
These are fairly ordinary html files, except that the titles can go up to 
900 characters.  They contain, in all, approximately 105,000 words.

At 08:04 AM 8/31/00, you wrote:
>At 07:36 AM 08/31/00 -0700, Frank Heasley wrote:
> >OS Redhat Linux v 6.1
> >pII 233
> >128Mb RAM
> >
> >3,000 files, 1-2k each
> >
> >v 1.3.0: 8 minutes, 97.6% (no Meta indexxing)
> >v 1.3.2: 74 minutes, 99.7% of RAM (with Meta indexxing)
> >v2.0.1: 77 minutes, 99.5% of RAM (with Meta indexxing)
>
>Wow, what do you have in those files?  I think something is broken.  What
>else is running?  Do you have any swap space?
>
>I'm running Suse Linux with P550 128M.  Twice your number of files (6414)
>all about 1-2k each with quite a few meta tags and it indexes in 33 seconds.
>
>MetaNames SUBJECT TITLE DESCRIPTION URLS IDENTIFIER KEYWORDS CREATOR
>CATEGORY AUTHOR PUBLISHER
>PropertyNames CATEGORY SUBJECT
>
> > wc -w *.htm | grep total
>1239639 total words
>
> > ll | wc -l
>    6434 total files
>
> > ./swish -c swish_no_stem.conf
>Indexing Data Source: "File-System"
>Indexing ../docs..
>Removing very common words...
>8 words removed.
>0 words removed not in common words array:
>
>Writing main index...
>Computing hash table ...
>Writing header ...
>Writing index entries ...
>Writing stopwords ...
>28016 unique words indexed.
>Writing file index...
>Writing file list ...
>Writing file offsets ...
>Writing MetaNames ...
>Writing offsets (2)...
>6414 files indexed.
>Running time: 38 seconds.
>Indexing done!
>
>
>Bill Moseley
>mailto:moseley@hank.org
Received on Thu Aug 31 15:32:48 2000