Skip to main content.
home | support | download

Back to List Archive

Re: converting .temp indices to usable indices

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Sun Dec 07 2003 - 06:34:41 GMT
On Sat, Dec 06, 2003 at 09:23:01PM -0800, Dave Stevens wrote:
> > Pushing the design of swish-e, perhaps.  Seems like more people are
> > using swish-e for large collections.  How much RAM are you using?
> 
> This machine is a single Athlon XP 1800+ with a an inexpensive Asus K7
> board and only 512 MB of RAM.

I assume you are using -e when indexing.

> > Did you look at inktomi?  It uses a database that is searchable as it is
> > indexed.
> 
> No, but I will.  I've looked at Nutch but it doesn't seem like much is
> going on, though they do post snaps fairly regularly and I've heard they
> have anonymous access to CVS but haven't used it.  Last year I looked at
> Google appliances for my former sites and it was a couple hundred grand
> for a two year license.  At this point there is no investment in the
> current project other than me (don't even have a business model yet) so I
> need to make it work with a free sort of software license.  I'm willing to
> spend a few grand in hardware and move back into a colo and support that
> (about half a dozen boxes live in what was my dining room) but I won't be
> able to afford licensing any enterprise level software.

On such a large scale you need something where you can incrementally 
update the index.  Frankly, if documents are available locally I think 
completely reindexing with swish-e is often as fast as updating other 
types of indexes.  Maybe.

Another to look at, if you can stand java, is Lucene.  I haven't tried 
it but their goal is an Open Source large-scale search engine.  Hey, Bob 
Dylan's site uses it (although I could not get it to work).

  http://jakarta.apache.org/lucene/docs/index.html


-- 
Bill Moseley
moseley@hank.org
Received on Sun Dec 7 06:34:47 2003