Skip to main content.
home | support | download

Back to List Archive

Re: Indexing/Searching for Plurals

From: Craig A Summerhill <craig(at)not-real.cni.org>
Date: Thu Aug 13 1998 - 06:19:54 GMT
On Tue, 11 Aug 1998, Mark Gaulin <gaulin@designinfo.com> wrote:
> 
> I am playing around with a Stem() function from WAIS, and it works
> well (for what it is).
> 
snip 
> 
> Using wildcard searches seems like a partial solution, since there
> are two sets of words that are of interest: the words the user wants
> to find and the words that the author(s) of the corpus happened to
> use. If I could be sure that all of my documents used "motor"
> and not "motors" then there would be less of a problem. Since that
> is not the case I want to have more control.  (This is a weak argument,
> I know. Basically, automatic de-stemming is just "easier" to use,
> in my opinion.)

There is, of course, an alternate approach to solving this problem 
that has been employed in a number of the better commercial search 
engines such as BRS/SEARCH (now Dataware II) for years.

You can allow the user to browse the index and select words from the
directly from the index to be incorporated into a search.  BRS/SEARCH
calls the feature "Browse Right" and "Browse Left" (sorted left to
right) which presents the user with a section of the index showing 
words which occur immediately around the selected word in the index.  
The user then can move up and down through the index and select the 
words to incorporate into a search. 

All words selected from the index are searched for using an OR operator
to generate a search set (which can then be modified by adding other
words or operators). 

       motivate
       motivating
       motive
       motives
       motley
       motocross
 --> X motor
       motorbike
       motorboat
       motorcar
       motorcycle
       motorcicle
       motorhome
       motorist
     X motorize
     X motors

One of the things I use this feature for quite frequently is locating
documents within databases which has typographical errors in them.  On
more than one occasion I have located a piece of e-mail in the archives
of a list by browsing the index and discovering that the word I was
looking for had been mis-spelled by the author (although in most cases,
the same word is spelled right elsewhere in the document).  It is also 
useful for alternate spellings (e.g. British vs. American English).
-- 

   Craig A. Summerhill, Systems Coordinator and Program Officer
   Coalition for Networked Information
   21 Dupont Circle, N.W., Washington, D.C.   20036
   Internet: craig@cni.org   AT&Tnet (202) 296-5098
Received on Wed Aug 12 23:33:43 1998