Skip to main content.
home | support | download

Back to List Archive

RE: Multilanguage stemmers - norwegian

From: Bruusgaard, Jan <jan.bruusgaard(at)not-real.ssb.no>
Date: Wed Jul 02 2003 - 14:27:27 GMT
Hi Jose

I have been testing the Norwegian stemmer a bit more, but it does not seem to work right in every case. 

Is there some kind of limit for the length of words? 
In my swish-e config file I have set: MaxWordLimit 40. I also downloaded the 2003-07-01 version

If you look at the examples below the stemmer should have removed the suffix en. 

In example 3 the stemming works right. Number of hits are 461 in both 3 and 4.  That is not the case in example 1.
The Norwegian Snowball page shows the right stemming for the word "konsumprisindeksen".

Jan

--- output ---

1
# SWISH format: 2.4.0-pr1
# Search words: konsumprisindeks
# Removed stopwords: 
# Number of hits: 90
# Search time: 0.001 seconds
# Run time: 0.032 seconds
1000 http://www.ssb.no/emner/08/02/10/hkpi/index.html "Konsumprisindeks, harmonisert" 640

2
# Search words: konsumprisindeksen
# Number of hits: 270
1000 http://www.ssb.no/emner/08/02/10/kpi/index.html "Konsumprisindeksen - hovedside" 697

3
# Search words: boligtelling
# Number of hits: 461
1000 http://www.ssb.no/emner/02/01/fobbolig/index.html "Endelige tall fra boligtellingen. Folke- og boligtellingen 2001" 675

4
# Search words: boligtellingen
# Number of hits: 461
1000 http://www.ssb.no/emner/02/01/fobbolig/index.html "Endelige tall fra boligtellingen. Folke- og boligtellingen 2001" 675


-----Opprinnelig melding-----
Fra: jmruiz@boe.es [mailto:jmruiz@boe.es]
Sendt: 16. juni 2003 18:28
Til: Bruusgaard, Jan
Emne: Re: SV: [SWISH-E] Multilanguage stemmers - norwegian



Forgot to mention...

I have removed in the cvs the annoying messages you 
have noticed. Once again, it was my fault. They were
a couple of debug messages. You can delete them
in index.c or update from cvs.

Jose

On 16 Jun 2003 at 16:38, Bruusgaard, Jan wrote:

> Hi.
> 
> I installed new version of swish-e, and it seems to work, but I am not
> shure if it is the norwegian stemmer i use. I have to test abit more. 
> 
> I had to use:
> 
> UseStemming yes
> # Put yes to apply word stemming algorithm during indexing,
> # else no. See the manual for info about stemming.
> 
> If I use: 
> 
> UseStemming no
> 
> no means no, not norwegian and it is not stemming.
> 
> 
> When indexing i also get a a lot of these messages:
> 
> (...)
> Antes Stemm index.c 0x400ccc68
> Despues Stemm index.c
> Antes Stemm index.c 0x400ccc68
> Despues Stemm index.c
> Antes Stemm index.c 0x400ccc68
> Despues Stemm index.c
> Antes Stemm index.c 0x400ccc68
> Despues Stemm index.c
> Antes Stemm index.c 0x400ccc68
> Despues Stemm index.c
> (...)
> 
> 
> Jan
> 
> -----Opprinnelig melding-----
> Fra: jmruiz@boe.es [mailto:jmruiz@boe.es]
> Sendt: 10. juni 2003 18:19
> Til: Multiple recipients of list
> Emne: [SWISH-E] Multilanguage stemmers
> 
> 
> Hi,
> 
> The rest of the snowball's stemmers has been added to swish
> (no,se,dk,fi,ru?). See previous posts about this issue to see how to
> use them.
> 
> Testers around the world are wellcome ;)
> 
> cu
> Jose
> 
Received on Wed Jul 2 14:27:32 2003