Skip to main content.
home | support | download

Back to List Archive

Re: Stemming - Varying Results

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Tue Oct 04 2005 - 20:40:14 GMT
On Tue, Oct 04, 2005 at 01:11:16PM -0700, Antonio Barrera wrote:
> I am using Stemming_en, a search for "Environmental" includes the 6.xml in
> the results, but not 398.xml.  A search for environment, returns 398.xml,
> but not 6.xml.  In the live version, Environment returns 22 hits,
> Environmental 30.  Shouldn't stemming result in the same number of hits?

Depends on the words and how the stemmer works.

moseley@bumby:~$ cat c
FuzzyIndexingMode stemming_en
moseley@bumby:~$ cat words
Environment
Environmental

moseley@bumby:~$ swish-e -T indexed_words -c c -i words -v0
    Adding:[1:swishdefault(1)]   'environ'   Pos:5  Stuct:0x9 ( BODY FILE )
    Adding:[1:swishdefault(1)]   'environment'   Pos:6  Stuct:0x9 ( BODY FILE )

So the stemmer considers those to be different.

-- 
Bill Moseley
moseley@hank.org

Unsubscribe from or help with the swish-e list: 
   http://swish-e.org/Discussion/

Help with Swish-e:
   http://swish-e.org/current/docs
   swish-e@sunsite.berkeley.edu
Received on Tue Oct 4 13:40:24 2005