Skip to main content.
home | support | download

Back to List Archive

RE: RE: stemming

From: David Norris <dave(at)not-real.webaugur.com>
Date: Fri Nov 19 1999 - 21:21:33 GMT
> My sysadmin won't let me pass wildcards through the CGI script.

That is not an entirely bad idea depending on the web server's config.

> What's the general opinion of people who've been using swish for a
long
> time? Does stemming pay off?

I've not noticed any major ill effects of stemming on my system.  It
generally works quite well.  I added a minimum word length to my
stemmer.c, per Bill Moseley's suggestion I believe.  That seems to
have clipped a number of false or inaccurate results lists.  Short
words don't stem well and often look like something that should be
stripped.  So you might end up with a 0 character word after stemming.

> Is it too slow

I would call it anything but slow.  Total running time for my entire
script including forking SWISH is rarely more than a few hundred
milliseconds.  It is almost instantaneous.

> that confuse the lo-tech users?

I added a bit of code to the output parser in my search script that
looks for "Stemming Applied: 1" and prints a note indicating that
stemming is active on that particular index.  I did the same for my
Soundex module, as well.  That way a search of multiple indices will
automatically indicate which options are active.

> does it return false hits

It could potentially do that.  I'd be just as concerned with it not
matching words which it should.  It is a trade-off, it will make a few
mistakes.  You just don't want it to match or miss too many words
either way.

http://www.webaugur.com/search/

--
,David Norris
  The OpenSA Project - http://www.opensa.de/
  Dave's Web - http://www.webaugur.com/dave/
  ICQ Universal Internet Number - 412039
  E-Mail - dave@webaugur.com
Received on Fri Nov 19 13:26:07 1999