On Fri, Nov 10, 2006 at 09:24:12PM -0800, Peter Karman wrote:
> The difference when I put them back in however was that instead of being
> FUZZY_STEMMING_EN they were changed to FUZZY_STEMMING_EN2. FUZZY_STEMMING_EN was
> dropped from stemmer.h at the same time.
>
> To make matters more confusing, the error message indicates that the deprecated
> features Stemming_en and Stem will use Stemmer_en1 -- but they are marked with
> FUZZY_STEMMING_EN2 even though they call the same init/free functions as
> Stemmer_en1.
Oh, that's not good.
>
> So, there's definitely something suspicious in stemmer.c I think. I'm going to
> commit a change to CVS -- Brad, would you take a look at the CVS version and see
> if that works any better?
This will require re-indexing. That table maps the configuration
names to an index number used to indicate the stemmer -- and that
number is stored in the index to know what stemmer to use when
searching.
Brad's original config had:
FuzzyIndexingMode Stemming_en2
which mapped to the "english" stemmer and stored FUZZY_STEMMING_EN2 in
the index. Then when searching FUZZY_STEMMING_EN2 was searched in the
table and found the "porter" stemmer as could be seen in his headers:
# Fuzzy Mode: Stemming_en
Which could cause problems. What I'm still confused about is why the
size of the index would have made a difference.
Peter, that fuzzy_mode index must match up to only one stemmer, but
there can be multiple entires for a give fuzzy_mode to allow for
aliases (Stem, Stemming_en, Stemming_en1, for example).
--
Bill Moseley
moseley@hank.org
Unsubscribe from or help with the swish-e list:
http://swish-e.org/Discussion/
Help with Swish-e:
http://swish-e.org/current/docs
swish-e@sunsite.berkeley.edu
Received on Sat Nov 11 07:48:53 2006