Skip to main content.
home | support | download

Back to List Archive

Re: Swish-e max db size vs. Google App

From: Uwe Dierolf <swishe(at)not-real.ubka.uni-karlsruhe.de>
Date: Sat Apr 23 2005 - 13:54:48 GMT
Dear swish-e users,

Bill ask others that use Swish-E for large collections to reply.

We've created an hybrid OPAC (online public access catalog) for libraries,
it's called XOPAC where X stands for extandable.
If you are familiar with the german language, you can find more
information about our project on our project homepage http://www.xopac.de/

XOPAC uses Swish-e as search engine and for presention short titles
and PostgreSQL for managing the raw data and for presenting the full titles.

The OPAC of the university library of Karlsruhe (Germany) has in about
1 million titles. We use the XML feature of Swish-e to search through 
our books and journals. One XML record contains all searchable data
for one title. So we have big XML records (in about 2 KB / record).
The size of our Swish-e index is in about 1 GB.

We've also set up XOPACs for other libraries with in about 3 million titles.
The size of these Swish-e indexes is in about 2,7 GB.

In all cases the performance is very good we use a ramdisk for the main index.
For soundex search we maintain an additional index on the harddisk.

Perhaps this helps others to estimate the power of Swish-e.

Best regards, Uwe Dierolf

------------------------------------------------------------------
Uwe Dierolf
University of Karlsruhe - University Library
P.O.Box 6920, 76049 Karlsruhe, Germany
phone(fax) : 49/721/608-6076(4886)
www        : http://www.ubka.uni-karlsruhe.de/dierolf/
------------------------------------------------------------------

Am Fri, Apr 22, 2005 at 12:09:30PM -0700 schrieb Bill Moseley:
> > The home page says, "Swish-e is ideally suited for collections of a
> > million documents or smaller."  I've seen posts on the list about 2GB+
> > indexes of ~6 million documents under 2.5.x, along with a comment from
> > Bill that that was pushing the envelope.  Does that reflect reasonable
> > upper limits for current and forthcoming versions repectively?  Am I
> > overlooking something obvious in the documentation?
> 
> I can't really answer.  Swish is not designed to scale to huge
> collections -- for some value of huge.  Clearly if you have a lot more
> RAM, disk, cpu, and time to wait you can index more.  Swish uses hash
> tables that tend to slow down as they get larger.  Try using -S prog
> and a program to generate random docs and have it report changes in
> indexing every few thousand files and you can watch it slow down.
> Using -e helps a lot (in the tests I did) but it still slows down
> after a while.
> 
> Searching also depends entirely on memory.
> 
> Hopefully others that use swish for large collections will reply.
Received on Sat Apr 23 06:54:49 2005