Skip to main content.
home | support | download

Back to List Archive

Re: [SWISH-E:276] How well does swish-e scale?

From: Eli's List Clearing House <lch(at)not-real.qz.to>
Date: Tue May 05 1998 - 19:44:24 GMT
> Hi, I recently was trying to set up swish-e 1.1 to index the user web
> pages of this educational site.  I generated a swish conf listing all
> the user web directories without any problems, and ran through
> small-scale tests.  However, when I try to index the whole set of user
> pages (roughly 2500 directories with varying numbers of indexable files
> in them), it tends to just "stop" after a while.  It'll be going

Yup.

> along speedily, and then will start slowing down more and more.  I left
> it running for over 3 days
> at one point, and it had only made it through about 1200 of the user
> directories.

I did the same thing on my site of some 7000 (at the time) text files,
averaging about 20k each.

> The machine it's running on (the web server) is reasonably fast (Indy
> R4400SC/100) with 96MB of
> RAM and 128MB of swap.   The machine isn't running out of swap (although
> it is going into swap by as much as 50-60 megs when it's at 1200
> users) doing this.

I'm using a mere P200 with 64MB ram/128MB swap.

> So, is what I'm trying to do not do-able with swish-e?  Does something

Sortof.

> in swish-e's design (maybe it needs to rewrite a big chunk of the index
> file in memory every time it adds something) make it not
> scalable to the level of what I'm doing?   And if it does manage to

That's what I would guess, but I haven't dug at the code. I do know that
it slows to a crawl as soon as it runs out of ram.

> index it, would it be too slow in
> searching the index (the size of all the stuff being indexed is probably
> 100 or so megs)?

Search speed only seems to be a problem if people look for words that are
too common. (A good argument for a large stop list. Is there any easy
way to find out the most frequent words in my files by going through
the indexes somehow?)

> Any ideas?  I was hoping swish-e would do the trick, since excite for
> web servers had failed abysmally (buggy and always died after a certain
> number of entries).

Make many small indexes, I made about a dozen of them. Then use the
index merge feature to create a larger index. Contrary to documentation,
merging does not use half as much memory as the final index size, twice
as much seems to be the proper value. I know that it kept running out
of memory on my machine -- otherwise idle -- at the very end of creating
a 60mb index. So I have two 30meg ones instead and have the searching use
both.

Sometime in the hopefully near future I'll be adding a lot more ram to
this box, and maybe a second CPU, and will be able to merge those
together painlessly.

Elijah
Received on Tue May 5 12:58:31 1998