Skip to main content.
home | support | download

Back to List Archive

Re: mis-spelled words

From: <moseley(at)not-real.hank.org>
Date: Fri Aug 22 2003 - 14:57:14 GMT
On Fri, Aug 22, 2003 at 07:39:26AM -0700, David Hoare wrote:

> One of the things I would like to do however is if a search did not return 
> a hit then check the search words against the indexed words and do some 
> approximate matching (agrep type of thing) and return a link to a new 
> "corrected" search. The equivalent of google's "did you mean _BLAH_" when 
> you misspell something. 
> 
> I use a Linux box and program in tcl or perl for preference.

There's a module included in the swish-e distribution called 
ParseQuery.pm.  It's suppose to take the output from the "Parsed Words:" 
header and, well, parse it.

Then on CPAN grab the Text::Asepll Perl module.  You can use that to 
lookup words (parsed by ParseQuery.pm) to lookup words.

You will likely want to create a dictionary of only the words in your 
index.  You can use one of the -T options to extract out all the words 
from the index for use in your dictionary.  The Text::Aspell 
docs describe how to create an Aspell dictionary from this list of 
words.  You might decide to create a dictionary for each metaname.

So, when you get "no results" back from swish you can look up each word 
in the query and check if it's spelled correctly and if there are any 
suggestions from Aspell.

It's a bit complicated due to the fact that a misspelled word can return 
many, many suggestions -- and that there can be more than one misspelled 
word in a query.  Also, with boolean searches you can get no results 
when all the terms are indeed ok search words.

<he says without the time to do it>
It would not be that hard to hack swish-e to link directly with Aspell 
and do the word lookup at search time and offer a list of suggestions 
for each word not found in the index.
</>

-- 
Bill Moseley
moseley@hank.org
Received on Fri Aug 22 14:57:29 2003