> On such a large scale you need something where you can incrementally
> update the index. Frankly, if documents are available locally I think
> completely reindexing with swish-e is often as fast as updating other
> types of indexes. Maybe.
>
> Another to look at, if you can stand java, is Lucene. I haven't tried
> it but their goal is an Open Source large-scale search engine. Hey, Bob
> Dylan's site uses it (although I could not get it to work).
There are also Mnogosearch, ht://dig, Harvest and Nutch:
- Mnogosearch is extremely slow (both indexing and searching), completely
unusable for more > 100.000 pages
- ht://dig doesn't have duplicate detection; but it is the fastest crawler I
have ever seen; search speed is also fine, but it is resource eater
- I am just testing Harvest, but they state in docs that Swish is faster.
- Nutch is promising and very fast, but still not even in beta stage.
Received on Sun Dec 7 10:43:38 2003