Skip to main content.
home | support | download

Back to List Archive

Re: Swish-e scalability, performance

From: Aaron Bazar <aaronb(at)>
Date: Mon Nov 15 2004 - 23:52:23 GMT
 # SWISH format: 2.4.1
# Search words: (null)
# Index File: nov13
# Name:
# Saved as: nov13
# Total Words: 3259677
# Total Files: 2024134
# Indexed on: 2004-11-13 22:15:49 CST

I just built an index with over 2 million files. Queries are still


Aaron Bazar

-----Original Message-----
From: []
On Behalf Of Bill Moseley
Sent: Monday, November 15, 2004 4:17 PM
To: Multiple recipients of list
Subject: [SWISH-E] Re: Swish-e scalability, performance

On Mon, Nov 15, 2004 at 01:04:57PM -0800, wrote:
> A couple of the drawbacks with swish-e for a large Web wide search 
> tool were after long (700-800k pages, or a few days) crawls 
> would hang or become incredibly slow even on a dual Opteron 242 with 4GB

Hum, do you think the machine was running low on memory? simply
keeps a hash of URLs seen, so it's all in memory.  It would be nice to have use either a database or BerkeleyDB so that it could be restarted
-- I thought about just using Storable to dump the hash to disk if it gets a
signal to abort.  Then read that back in to continue.

> To be fair I don't think the original intent of swish-e was to be a 
> Web wide level search tool, but it does a pretty good job up to a 
> million or two pages.

That's the bottom line. Kevin wrote the original swish in a weekend or so
and the basic design hasn't really changed.  Things are faster, but that's
about it.  That's kind of a problem, as you can evaluate swish and it looks
real fast compared to other indexers, but then you hit some limit and it
slows down real fast.

I'm always amazed when people post that they are using it for millions of

Bill Moseley

Unsubscribe from or help with the swish-e list:

Help with Swish-e:
Received on Mon Nov 15 15:52:30 2004