Skip to main content.
home | support | download

Back to List Archive

Re: [swish-e] Snowball stemmers

From: Peter Karman <peter(at)>
Date: Thu Oct 04 2007 - 01:56:11 GMT
Trygve Falch wrote on 10/2/07 4:05 AM:
>  There are to ways to I
> could solve this; Either introduce UTF-8 stemmers, with the changes
> needed in the swish-e code to accomodate that, or I could port the old
> russian ISO-stemmer to fit the new API.
> Any comments?

IIRC, stemming happens after tokenization in 2.4.x, so it makes more sense to me 
to port the old Russion stemmer to fit the newer Snowball API. Too much 
lossy-ness otherwise, going from UTF-8 (parsing) to ISO-8859-x (tokenizing) to 
UTF-8 (stemming) to ISO-8859-x (storage).

Peter Karman  .  .  peter(at)
Users mailing list
Received on Wed Oct 3 21:56:09 2007