I'm cc'ing the list on this since my reply will likely be of some use to others.
Kevin Bowling wrote on 2/4/09 9:33 PM:
> Hi,
>
> I have a fairly large collection that I index using Swish-e 2.4.5. It works
> fairly well, but the indexing speed seems average (only one core) and the
> search results are just mediocre. Obviously that version is quite old and
> could use some updates. I am hoping Swish3 will be that update.
>
Swish3 won't be any faster at indexing, at least as it stands now. Xapian is a
good deal slower than Swish-e in my tests. And I guess it depends on what you
mean by "search results are mediocre" as to whether the Xapian-backed Swish3
will be any improvement. Speed should be comparable. Ranking should be somewhat
better.
What Swish3 does that Swish-e 2.4.x does not is offer native UTF-8 and
incremental indexing support, scalable index size (Swish-e doesn't scale well
past about 1M docs), plus search bindings in many different languages (not just
C and Perl).
So there are tradeoffs. Swish-e 2.4.x is about as fast as you can get wrt
indexing and search speed. Swish3 trades some speed for full UTF-8 and lots more
flexibility and scalability.
> What I am confused about is that it now uses Xapian. I haven't tried Xapian
> but I know they have their own system called Omega. How does Swish3 differ
> from it? I just need a local filesystem indexer for a website with 200k+
> HTML, PDF, TXT and PS files. Are Swish-e and Omega the only two FOSS
> contenders?
Oh no. There are many. Lucene and its clones. KinoSearch. HyperEstraier (though
it seems to have fallen out of support). There are many others.
Swish3 offers a few things that Omega does not. MetaNames and PropertyNames for
one. Via SWISH::Prog, aggregation framework for http, mail, rdbms, as well as
filesystem. Single config file.
If all you need is a local filesystem indexer for a website with 200k+ docs
(which I would call medium-sized -- these days folks deal with multi-million doc
collections), and you don't need UTF-8 or incremental indexing, Swish-e 2.4.5 is
about as good as it gets. Don't let its age fool you. :)
FWIW, current SVN has some fixes/improvements over 2.4.5. There's actually a
2.4.6 tagged version that just hasn't ever made it to a fully-announced release
since we had some problems with the Windows build.
http://svn.swish-e.org/swish-e/tags/rel-2.4.6/
hope that helps.
pek
--
Peter Karman . http://peknet.com/ . peter(at)not-real.peknet.com
_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Wed Feb 4 23:49:43 2009