Re: Snowball stemmers

From: Bill Moseley <moseley(at)>
Date: Thu Jan 12 2006 - 18:09:21 GMT
On Thu, Jan 05, 2006 at 08:45:09AM -0800, Bruusgaard, Jan wrote:
> Is it possible for someone to update the Snowball stemming in Swish-e?
> I downloaded latest nightly build of Swish-e, and tried to change
> stem_no.c with a new stem_no.c from Snowball, because the norwegian
> algortithm here has been improved.
> But Snowball has changed their API after they started supporting UTF-8,
> so it seems to be some work here for someone with C programming skills.

Sorry for the delay in responding.  It would likely be a while before
I could take a look at this -- and I suspect the other developers
are busy, too.  If you could get an initial patch working then it
would likely find it's way into swish a lot sooner.

I have not looked at the new API, but if it requires utf8 on input then
we would need to use iconv to convert from swish-e's 8859-1 to utf8
and back again.  Look also at parser.c to see how swish uses libxml2's
method to convert utf8 to latin1.

Maybe someone else on the list with C skills would be interested in
helping?  Anyone?

Bill Moseley

