Re: RE: stemming

From: SRE <eckert(at)>
Date: Fri Nov 19 1999 - 16:51:08 GMT
At 10:31 PM 11/18/99 -0800, Bill Moseley wrote:
>The original poster might check that there aren't other words that might
>stem to rocki in the returned document.  I can't think of any right now.  I
>can't get Swish to find rock when searching for rockies.  I tried it on a
>fresh downloaded and compiled SWISH 1.3.2.

Our email crossed in the ether - I posted "swish-e -D" and "grep" results
last night. Indeed, "rock" is not found when searching for "rockies", but
"rocky" is found when searching for "rockies" via stemming to "rocki".

One could argue that "rocky" is a derivative of "rock", and as such
"rocky" should stem to "rock" instead of "rocki", but I'm no expert
on how stemming was designed... and with English you'll never get
all the special cases right.

>You can find out more in the Swish list archive, but you should know that
>wild card searches don't work as expected with Stemming.

My sysadmin won't let me pass wildcards through the CGI script.
The form is set up to strip all that stuff, which is why I turned
stemming on in the first place (poor man's wild card?).

What's the general opinion of people who've been using swish for a long
time? Does stemming pay off? Is it too slow, does it return false hits
that confuse the lo-tech users? (My site has no expectations yet, as
this is the first search option we've made available and it's not even
announced yet.)

>In other words, Swish perfectly good words out of the
>index, stems them, and then fails to find them in the index again.

Thanks for the tip. If I ever enable wild cards, I'll turn off stemming.


"If you think education is expensive, try ignorance."
		-- Derek Bok, president of Harvard
Received on Fri Nov 19 09:04:47 1999