On 01/29/2007 07:45 PM, Bill Moseley wrote:
>> "Not all stemmers set this value correctly." - well, this means at least some of them
>> DO return correct values. That's better than nothing.
>> Maybe it's time to fix those returning incorrect values?
>
> The stemming code in swish mixes "stemmers" from different sources.
> So not all errors apply to all stemmers.
Sure. But what does this imply?
"Not all error codes apply to all stemmers" - this sounds perfectly ok.
But "not all stemmers set this value correctly" means that some of the stemmers set an incorrect value (i.e. they are broken).
Or did I misunderstand you?
>> "But since SwishFuzzyWordList() will return a valid string regardless of the return value,
>> you can often just ignore this setting. That's what I do." - how often should I ignore it? =)
>> I mean, if the value of this function should be ignored, then the function itself is useless.
>
> It's not important to swish -- swish just passes in words and if
> there's a problem (like the word can't be stemmed) then it uses the
> un-stemmed word for indexing and searching.
Ok, but I was talking about an interface to libswish.
Swish may do internally whatever Swish likes, but some clarity in the docs would be helpful for third-party apps.
> It's been a long time since I looked at the Snowball API, but looking
> at this bit of code:
>
> fi->stemmer->lang_stem(snowball); /* Stem the word */
>
>
> if ( 0 == snowball->l )
> {
> fw->error = STEM_TO_NOTHING;
> return fw;
> }
>
> Shouldn't the return value of calling lang_stem() be tested? Or maybe
> testing the length is fine. I'm not sure.
>From what I can see in src/snowball/stem_en1.c, porter_stem() always returns 1.
So it looks like the check is ok.
>> Hence the question:
>> Would you accept a patch exporting those constants to public (and changing the
>> function prototype appropriately) or should I forget about SwishFuzzyWordError()?
>> See diff against current CVS in attachment.
>
> I think the patch makes sense. I'm not sure why the STEM_RETURNS
> struct was not made public.
Great.
--
Wbr,
Antony Dovgal
Received on Mon Jan 29 09:23:43 2007