Skip to main content.
home | support | download

Back to List Archive

Re: [swish-e] Snowball stemmers

From: Peter Karman <peter(at)not-real.peknet.com>
Date: Thu Oct 04 2007 - 01:56:11 GMT
Trygve Falch wrote on 10/2/07 4:05 AM:
>  There are to ways to I
> could solve this; Either introduce UTF-8 stemmers, with the changes
> needed in the swish-e code to accomodate that, or I could port the old
> russian ISO-stemmer to fit the new API.
> 
> Any comments?
> 

IIRC, stemming happens after tokenization in 2.4.x, so it makes more sense to me 
to port the old Russion stemmer to fit the newer Snowball API. Too much 
lossy-ness otherwise, going from UTF-8 (parsing) to ISO-8859-x (tokenizing) to 
UTF-8 (stemming) to ISO-8859-x (storage).

-- 
Peter Karman  .  http://peknet.com/  .  peter(at)not-real.peknet.com
_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Wed Oct 3 21:56:09 2007