Skip to main content.
home | support | download

Back to List Archive

Re: fix for my stemmer_en2 issue

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Sat Nov 11 2006 - 15:48:48 GMT
On Fri, Nov 10, 2006 at 09:24:12PM -0800, Peter Karman wrote:
> The difference when I put them back in however was that instead of being 
> FUZZY_STEMMING_EN they were changed to FUZZY_STEMMING_EN2. FUZZY_STEMMING_EN was 
> dropped from stemmer.h at the same time.
> 
> To make matters more confusing, the error message indicates that the deprecated 
> features Stemming_en and Stem will use Stemmer_en1 -- but they are marked with 
> FUZZY_STEMMING_EN2 even though they call the same init/free functions as 
> Stemmer_en1.

Oh, that's not good.


> 
> So, there's definitely something suspicious in stemmer.c I think. I'm going to 
> commit a change to CVS -- Brad, would you take a look at the CVS version and see 
> if that works any better?

This will require re-indexing.  That table maps the configuration
names to an index number used to indicate the stemmer -- and that
number is stored in the index to know what stemmer to use when
searching.

Brad's original config had:

    FuzzyIndexingMode Stemming_en2

which mapped to the "english" stemmer and stored FUZZY_STEMMING_EN2 in
the index.  Then when searching FUZZY_STEMMING_EN2 was searched in the
table and found the "porter" stemmer as could be seen in his headers:

    # Fuzzy Mode: Stemming_en

Which could cause problems.  What I'm still confused about is why the
size of the index would have made a difference.

Peter, that fuzzy_mode index must match up to only one stemmer, but
there can be multiple entires for a give fuzzy_mode to allow for
aliases (Stem, Stemming_en, Stemming_en1, for example).

-- 
Bill Moseley
moseley@hank.org

Unsubscribe from or help with the swish-e list: 
   http://swish-e.org/Discussion/

Help with Swish-e:
   http://swish-e.org/current/docs
   swish-e@sunsite.berkeley.edu
Received on Sat Nov 11 07:48:53 2006