On Tue, 11 Aug 1998, Mark Gaulin <gaulin@designinfo.com> wrote:
>
> I am playing around with a Stem() function from WAIS, and it works
> well (for what it is).
>
snip
>
> Using wildcard searches seems like a partial solution, since there
> are two sets of words that are of interest: the words the user wants
> to find and the words that the author(s) of the corpus happened to
> use. If I could be sure that all of my documents used "motor"
> and not "motors" then there would be less of a problem. Since that
> is not the case I want to have more control. (This is a weak argument,
> I know. Basically, automatic de-stemming is just "easier" to use,
> in my opinion.)
There is, of course, an alternate approach to solving this problem
that has been employed in a number of the better commercial search
engines such as BRS/SEARCH (now Dataware II) for years.
You can allow the user to browse the index and select words from the
directly from the index to be incorporated into a search. BRS/SEARCH
calls the feature "Browse Right" and "Browse Left" (sorted left to
right) which presents the user with a section of the index showing
words which occur immediately around the selected word in the index.
The user then can move up and down through the index and select the
words to incorporate into a search.
All words selected from the index are searched for using an OR operator
to generate a search set (which can then be modified by adding other
words or operators).
motivate
motivating
motive
motives
motley
motocross
--> X motor
motorbike
motorboat
motorcar
motorcycle
motorcicle
motorhome
motorist
X motorize
X motors
One of the things I use this feature for quite frequently is locating
documents within databases which has typographical errors in them. On
more than one occasion I have located a piece of e-mail in the archives
of a list by browsing the index and discovering that the word I was
looking for had been mis-spelled by the author (although in most cases,
the same word is spelled right elsewhere in the document). It is also
useful for alternate spellings (e.g. British vs. American English).
--
Craig A. Summerhill, Systems Coordinator and Program Officer
Coalition for Networked Information
21 Dupont Circle, N.W., Washington, D.C. 20036
Internet: craig@cni.org AT&Tnet (202) 296-5098
Received on Wed Aug 12 23:33:43 1998