Skip to main content.
home | support | download

Back to List Archive

Re: [swish-e] Snowball stemmers

From: Peter Karman <peter(at)not-real.peknet.com>
Date: Wed Oct 24 2007 - 04:02:52 GMT
Trygve Falch wrote on 10/4/07 7:04 AM:
> On Wed, 2007-10-03 at 20:56 -0500, Peter Karman wrote:
> 
>> IIRC, stemming happens after tokenization in 2.4.x, so it makes more sense to me 
>> to port the old Russion stemmer to fit the newer Snowball API. Too much 
>> lossy-ness otherwise, going from UTF-8 (parsing) to ISO-8859-x (tokenizing) to 
>> UTF-8 (stemming) to ISO-8859-x (storage).
> 
> I had missed the russian stemmer for KOI8. Which is, as far as I could
> tell is the same stemmer that was used in the old snowball.
> 
> I have also added additional languages for Hungarian and Romanian, and
> of course the russian stemmer.
> 

This patch has been committed to trunk with r1949.

Thanks, Trygve.

-- 
Peter Karman  .  http://peknet.com/  .  peter(at)not-real.peknet.com
_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Wed Oct 24 00:02:48 2007