Skip to main content.
home | support | download

Back to List Archive

Swish-E with incremental index building

From: Tilo Muetze <tmuetze(at)not-real.alanti.net>
Date: Mon Nov 29 2004 - 15:00:07 GMT
Hi,
I've looked through the FAQ and the discussion list archive, but haven't 
found a definitive answer, so hopefully you can help us out.

We already use Swish-E to index a small amount (~300 MB) of various 
files (-doc, .ppt, etc.). To handle that we have setup a hourly job in 
our scheduler which recreates the index every time. The index has to be 
build every hour because the fluctuation in the filesystem is quite high .

We now need to index quite large filesystems (> 1GB) and some huge 
intranet websites but still need the index to be built every hour. The 
problem with our current approach is the time which it takes to build-up 
the index from scratch, causing it to exceed the 1h timeframe.

I've read a lot about the incremental index stuff which seems to be 
exactly what we need. We build the index one time and later just index 
the new and remove the deleted documents from the index.
http://www.swish-e.com/Discussion/archive/2004-10/8413.html

So if I build Swish-E with --enable-incremental and later use -r to make 
sure "old" documents gets removed from the index, is that what we need 
to handle those huge amount of data, Or do you still see problems, 
despite the fact that that 2.5. is just a BETA yet?

A colleage has already evaluated http://search.mnogo.ru/, which seems to 
do exactly what we need, but it's still complete "virgin soil" for us, 
so it would be great if you have some ideas how we can accomplish that 
with Swish-E.

Best Regards,
Tilo Muetze
Received on Mon Nov 29 07:00:17 2004