Skip to main content.
home | support | download

Back to List Archive

Re: [swish-e] trading off nr vs size of index files

From: Josh Rabinowitz <joshr-swishe(at)>
Date: Wed Feb 20 2008 - 17:22:04 GMT
Hello, Judith:

I did some testing a few years ago and (if memory serves) found that 
searching on two indexes was about 10-20% slower than searching on 
the same data in one index.

Hope this helps,

At 7:34 AM +0200 2/20/08, Judith Retief wrote:
>  We're indexing about 2.5 million files at the moment, and we're 
>probably going to end up with about 6 million files eventually.
>  We categorise our files by certain criteria and index them into 
>seperate index files by category. A user then selects the categories 
>that he's interested in, and we only have to search through those 
>index files. This cuts down on the search and merging speed.
>  Now we want to optimise our categorisation criteria to find the 
>granularity that works best for swish, trading off:
>  - searching/merging against a few large index files vs
>  - searching/merging against many small index files
>The more indexes you have, the more specific the user can get with 
>his query and we can let off searching through irrelevant indexes. 
>But if the user happens to be interested in everything, you have to 
>search through a lot of small indexes.
>  Does anyone know whether swish shows a linear or an exponential 
>speed degradation when searching/merging larger size index files vs 
>larger number of files?
>  Thanks
>  Judith
>Users mailing list

-- Josh Rabinowitz                  --
-- --
-- SkateTalk Chat Systems(tm)    --
Users mailing list
Received on Wed Feb 20 12:28:31 2008