Skip to main content.
home | support | download

Back to List Archive

Re: Multiple index files

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Fri Feb 20 2004 - 03:06:00 GMT
On Thu, Feb 19, 2004 at 08:04:28AM -0800, Bill Pavich wrote:
> Question regarding the index files. I will be indexing new local
> documents each night. Does it matter, performance wise, if I create a
> separate index for each night and just specify my indexes to swish-e
> using a wildcard like so: swish-e -f /opt/vnews-index/*.index -w 'search
> words'
> 
> Or, should I merge the indexes together each night so that I just
> present one ever growing index to swish-e??

Or just reindex completely every night?

Searching multiple indexes has the overhead of opening each index file.
Internally swish searches them sequentially and maintains separate
sorted result lists.  When displaying the lists are merged together.

When sorting a single index by a property that's stored in the index (as
opposed to say rank which is calculated at search time) swish uses a
table that it created at indexing time.  That table allows sorting the
results by comparing integers instead of sort strings.  That's helpful
for large result sets.  When sorting results from multiple indexes the
pre-sorted tables are used to sort each result set, but when generating
the final results the individual results are merged (tape merged) and
this merge cannot use the pre-sorted tables and must read the property
(.prop) file for each result.  This can slow things down.

How's that for clear writing? 

You might also want to try enabling the incremental indexing mode -- run
./configure --help and look at the options.  In that mode you can add
files to an existing index with the -u option.  Documentation is sparse
on that feature.

-- 
Bill Moseley
moseley@hank.org
Received on Thu Feb 19 19:06:00 2004