Skip to main content.
home | support | download

Back to List Archive

RE: -v option for merges

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Thu Feb 14 2002 - 22:25:25 GMT
At 02:04 PM 02/14/02 -0800, Robert Ln wrote:
>Well actually, at least for me, the only level I was concerned with was
>-v 0, in which I would like to see nothing at all :)

Makes sense.  Actually, -v 0 for normal indexing isn't quite nothing:

 > ./swish-e -i test.html -v0
Indexing Data Source: "File-System"
Indexing done!


>Another comment is that for anything but the highest level (4 or debugging)
>it might be nice not to do the (I am assuming) CR but no LF trick
>to 'animate' the percent progress

I wish there was a better way to do that (besides not do it at all ;).  I
tried to find out if there was a portable way swish could detect if its
output was being captured and to turn off the percent progress.  But's it's
really a separate issue from -v, since you might not want -v 2 or above
(where you see too much detail about each file, but still want to see that
it's making progress.  That's helpful when indexing a very large number of
files.

There could be another switch to turn that off.  Or what I do is just pipe
the output through a filter that just strips those lines.

>If I read your previous mail correctly, however, can I assume that
>if I have a pretty big master index file, create incremental index files
>several
>times during the day, then want to update the master index file with the
>incremental
>indices, it is better to do a full index on all files, rather than merge?

In many cases, yes.  Merging saves you the time of parsing, which isn't
much unless you are spidering), but you don't get the in-memory compression
that's done while normal indexing.  On my machine with 128M, I can index
25,000 files in less than four minutes and that uses about 75M of RAM.  But
if I try to merge that index with another (even small) index my memory
usage goes way up and the machine swaps like crazy and takes more time than
I'm willing to wait.

The problem is that swish (thanks to Jose) does a lot of compression after
parsing each file.  But while merging it can't do that optimization because
merge is working with individual words, and not files.  There's probably
ways to solve that, or make it better, but the goal is instead to find a
way to do real incremental indexing, which should eliminate the need for
merge.

Incremental indexing is still a ways off, though.


-- 
Bill Moseley
mailto:moseley@hank.org
Received on Thu Feb 14 22:25:45 2002