Skip to main content.
home | support | download

Back to List Archive

RE: RE: Failing to find a word

From: David Norris <kg9ae(at)not-real.geocities.com>
Date: Fri Oct 08 1999 - 17:51:35 GMT
> No, you want to stem the query since the index is stemmed.  In my
search
> front-end I scan the query for wild cards.  If I find them I stem
them
> before passing to Swish.  This was my work-around for that problem.
But it
> doesn't solve the double-stemming problem that causes swish to fail
to find
> words.

That would fix the immediate problem of not matching double stemmed
words.  If the index has stemming applied then it should stem the
query as a whole.  However, I do not see a reliable way to correctly
stem a word containing a wildcard.  (Perhaps, stem the word while
preserving the wildcard?)  It should probably be an either-or
situation.  Either wildcard match a given word in the query or stem a
given word in the query.  Of course, this means that the wildcard
would have to match the stem of a word.  It almost seems logical that
way but not completely.

> expandstar() seems like a good place to move the call to Stem(), as
it
> seems to be after any stop words are removed, and all the words are
> processed one-by-one.  (again from my poor reading of the code.)

That is roughly the point at which individual 'words' are first able
to be extracted from the search query {strcpy(searchword, sp->line)}.
I think that the expandstar function itself should be replaced with a
generic word processing mechanism.  The expandstar processing code
itself would probably be best seperated into its own file.  That
should make writing additional word processing code much easier.  The
end result would have each word processing 'module' (expandstar, stem,
etc) process the words one by one (perhaps based on some simple
rules).  The generic word processing function would take an swline
structure (i.e. tmplist) as input and return an swline structure (i.e.
newp).  Overall, this would probably require some modifications to
index.c, search.c, and stemmer.c (and soundex.c for my soundex
module).

,David Norris

World Wide Web - http://www.webaugur.com/dave
Page via mail - 412039@pager.mirabilis.com
ICQ Universal Internet Number - 412039
E-Mail - dave@webaugur.com
Received on Fri Oct 8 10:57:54 1999