Skip to main content.
home | support | download

Back to List Archive

Re: Fuzzy Indexing Questions

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Thu May 08 2003 - 06:39:52 GMT
On Wed, May 07, 2003 at 05:18:58PM -0700, John Movius wrote:

> Does anyone have any stats on the relative size of a regular SWISH-e
> index vs. a fuzzy SWISH-e index?  I realize this could vary
> considerably.   


Here's another sample of about 10,000 entries using Stemming.

    8170019 May  7 23:06 index.swish-e
    1519304 May  7 23:06 index.swish-e.prop

    8643319 May  7 23:09 index_no_stem.swish-e
    1519304 May  7 23:09 index_no_stem.swish-e.prop

As you can see, not much different.  One bummer is the .prop file is duplicated for each.

Would not be too much of a hack to get swish to create an index that included stemming and 
non-stemming within the same index.  Could just use metanames to store the different 
versions of the same word internally.


-- 
Bill Moseley
moseley@hank.org
Received on Thu May 8 06:44:36 2003