Skip to main content.
home | support | download

Back to List Archive

Re: SwishFuzzyWordError() and missing stemmer constants

From: Antony Dovgal <antony(at)not-real.zend.com>
Date: Mon Jan 29 2007 - 17:23:42 GMT
On 01/29/2007 07:45 PM, Bill Moseley wrote:
>> "Not all stemmers set this value correctly." - well, this means at least some of them 
>> DO return correct values. That's better than nothing.
>> Maybe it's time to fix those returning incorrect values?
> 
> The stemming code in swish mixes "stemmers" from different sources.
> So not all errors apply to all stemmers.

Sure. But what does this imply?
"Not all error codes apply to all stemmers" - this sounds perfectly ok.
But "not all stemmers set this value correctly" means that some of the stemmers set an incorrect value (i.e. they are broken).
Or did I misunderstand you?

>> "But since SwishFuzzyWordList() will return a valid string regardless of the return value, 
>> you can often just ignore this setting. That's what I do." - how often should I ignore it? =)
>> I mean, if the value of this function should be ignored, then the function itself is useless.
> 
> It's not important to swish -- swish just passes in words and if
> there's a problem (like the word can't be stemmed) then it uses the
> un-stemmed word for indexing and searching.

Ok, but I was talking about an interface to libswish.
Swish may do internally whatever Swish likes, but some clarity in the docs would be helpful for third-party apps.

> It's been a long time since I looked at the Snowball API, but looking
> at this bit of code:
> 
>     fi->stemmer->lang_stem(snowball); /* Stem the word */
> 
> 
>     if ( 0 == snowball->l )
>     {
>         fw->error = STEM_TO_NOTHING;
>         return fw;
>     }
> 
> Shouldn't the return value of calling lang_stem() be tested?  Or maybe
> testing the length is fine.  I'm not sure.

>From what I can see in src/snowball/stem_en1.c, porter_stem() always returns 1.
So it looks like the check is ok.
 
>> Hence the question: 
>> Would you accept a patch exporting those constants to public (and changing the 
>> function prototype appropriately) or should I forget about SwishFuzzyWordError()?
>> See diff against current CVS in attachment.
> 
> I think the patch makes sense.  I'm not sure why the STEM_RETURNS
> struct was not made public.

Great.

-- 
Wbr, 
Antony Dovgal
Received on Mon Jan 29 09:23:43 2007