Skip to main content.
home | support | download

Back to List Archive

Re: [SWISH-E:138] Re: size of database

From: Eli's List Clearing House <lch(at)not-real.qz.to>
Date: Tue Feb 03 1998 - 16:18:29 GMT
Nick Phillips (nwp@galleryonline.com) wrote:
> Craig A Summerhill wrote:
> > On Tue, 3 Feb 1998, Jean-Louis Maltret <jlm@eiffel.univ-mrs.fr> wrote:
[question on index sizes]
> > I'm afraid that swish-e would choke the machine (for RAM) if I tried to
> > index it all at once.  I'm finding it to be a real memory hog...

I've divided my site indexing up into a few logical segments. The
largest segment has just under 7000 files totalling 220 megs. The
total index size for that is about 60megs. I found that the fastest
way to index it was in chunks of about 500 files, and then merge
those. Whatever the documentation says about the memory for merging
a database, my machine with 64meg of RAM and 130meg of swap kept
running out of memory (even with most services shutdown) before
finishing. It would get up to about 6000 files before conking out.
So I have that index in two files. 

> One thing you might look at is reducing the size of MAXSTRLEN in swish.h
> -- there are loads of strings stored while swish is indexing, most of
> which are well below the default size of MAXSTRLEN (1000). Try setting
> it to a couple of hundred, and that should significantly reduce memory
> usage, even if it still remains a hog.

Hmmm. useful tidbit. Next time I modify my swish-e I'll look at that.
Is this just used for storing index terms? If so, why is it so large?

> Anyone for Perl?

Perl doing what?

Elijah
------
#!/usr/bin/perl -- -*- my ny.pm sig -*-
$_=$^ ;s;s;sss;;s^.^ju^&&s&P+&\n&&&(s(_..)(ers)||s|^|^^|)&&s(T)(q(st%eg))eg;
s<.(o).><$& new 1$$>i+s+\dst.+$a--||reverse(q(rep k))+ge;s*%.+u* so+*i;s=\++
="me"=mex&&s%ege%l$"hke%;$a||s/^\S+ /\/\//;s;\d+;yor;;s[KE]<ac$&>i;print $_;
Received on Tue Feb 3 08:35:27 1998