Skip to main content.
home | support | download

Back to List Archive

Re: [swish-e] Swish3 vs Omega

From: Peter Karman <peter(at)not-real.peknet.com>
Date: Thu Feb 05 2009 - 04:49:32 GMT
I'm cc'ing the list on this since my reply will likely be of some use to others.

Kevin Bowling wrote on 2/4/09 9:33 PM:
> Hi,
> 
> I have a fairly large collection that I index using Swish-e 2.4.5.  It works 
> fairly well, but the indexing speed seems average (only one core) and the 
> search results are just mediocre.  Obviously that version is quite old and 
> could use some updates.  I am hoping Swish3 will be that update.  
> 

Swish3 won't be any faster at indexing, at least as it stands now. Xapian is a
good deal slower than Swish-e in my tests. And I guess it depends on what you
mean by "search results are mediocre" as to whether the Xapian-backed Swish3
will be any improvement. Speed should be comparable. Ranking should be somewhat
better.

What Swish3 does that Swish-e 2.4.x does not is offer native UTF-8 and
incremental indexing support, scalable index size (Swish-e doesn't scale well
past about 1M docs), plus search bindings in many different languages (not just
C and Perl).

So there are tradeoffs. Swish-e 2.4.x is about as fast as you can get wrt
indexing and search speed. Swish3 trades some speed for full UTF-8 and lots more
flexibility and scalability.


> What I am confused about is that it now uses Xapian.  I haven't tried Xapian 
> but I know they have their own system called Omega.  How does Swish3 differ 
> from it?  I just need a local filesystem indexer for a website with 200k+ 
> HTML, PDF, TXT and PS files.  Are Swish-e and Omega the only two FOSS 
> contenders?

Oh no. There are many. Lucene and its clones. KinoSearch. HyperEstraier (though
it seems to have fallen out of support). There are many others.

Swish3 offers a few things that Omega does not. MetaNames and PropertyNames for
one. Via SWISH::Prog, aggregation framework for http, mail, rdbms, as well as
filesystem. Single config file.

If all you need is a local filesystem indexer for a website with 200k+ docs
(which I would call medium-sized -- these days folks deal with multi-million doc
collections), and you don't need UTF-8 or incremental indexing, Swish-e 2.4.5 is
about as good as it gets. Don't let its age fool you. :)

FWIW, current SVN has some fixes/improvements over 2.4.5. There's actually a
2.4.6 tagged version that just hasn't ever made it to a fully-announced release
since we had some problems with the Windows build.

http://svn.swish-e.org/swish-e/tags/rel-2.4.6/

hope that helps.
pek

-- 
Peter Karman  .  http://peknet.com/  .  peter(at)not-real.peknet.com
_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Wed Feb 4 23:49:43 2009