On Mar 18, 2004, at 5:06 PM, swish-e@sunsite.berkeley.edu wrote:
>> http://www.swish-e.org/Discussion/archive/2003-08/6028.html
>>
>> Some one talked about mis spelled word on this forum and the answer
>> was =
>> to use the perl module , using which look up for one of the matches
>> from =
>> systems dictionary. Here a question arises in mind that what if after
>> =
>> suggesting that word and being requested for that word in search, no =
>> results found from the index file.
>
> Eh, reread that post again. That was the the point of that message --
> using a dictionary built from words that *are* in the index.
Ah, ha! I can contribute something here.
Taking the lead from Bill a number of months ago I hacked together a
Did You Mean function in a number of my swish-based searches. The
technique first involves creating a dictionary of terms from the
content of a swish index. Next, when examining the number of hits from
a swish search, a thresh hold is set, and if the thresh hold is less
than the specified number I grab number of possible other words from
the dictionary and rebuild the initial query.
Here is some sample code. First, the process to create a dictionary:
#!/usr/bin/perl
# make-dictionary.pl - create an Aspell dictionary from a swish-e index
# Eric Lease Morgan <eric_morgan@infomotions.com>
# Thanks to Bill Mosely who inspired this hack.
# 2003/12/08 - got it working after reading Perl Cookbook
# 2003/11/27 - first investigations; Thanksgiving
# define a few contants
my $SWISH = '/usr/local/bin/swish-e -T INDEX_WORDS_ONLY -f
/usr/local/apache/htdocs/books/etc/books.idx';
my $ASPELL = '/usr/local/bin/aspell --lang=en create master
/usr/local/apache/htdocs/books/etc/books.dict';
######################################################
# no configuration should be necessary below this line
# practice good programming
use strict;
# initialize input and output words
my $words = undef;
# get the list of words from the index
open INPUT, "$SWISH |";
while (<INPUT>) {
chop; # get rid of trailing return
next if (! /^[A-Za-z]+$/); # discard word that include numbers
$words .= $_ . ' '; # build list of valid words
}
close INPUT;
# create a dictionary
open OUTPUT, "| $ASPELL";
print OUTPUT $words;
close OUTPUT;
# done; too simple!
exit;
Second, a code snippet from a search routine, specifically a suggestion:
# define constants
my $INDEX = './etc/books.idx';
my $DICTIONARY = './etc/books.dict';
my $query = 'foo and bar';
# create swish object
my $swish = SWISH::API->new($INDEX);
# create a search object
my $search = $swish->New_Search_Object;
# search
my $results = $search->Execute($query);
# get the number of titles found
$number_of_hits = $results->Hits;
# check for number of hits
if ( ! $number_of_hits ) {
# initalize dictionary
my $dictionary = Text::Aspell->new;
$dictionary->set_option('master', $DICTIONARY);
# parse the query
my @query = split / /, $query;
# initialize the new query
my $new_query = undef;
# process each query word
foreach my $q (@query) {
# get suggestion
my @suggestions = $dictionary->suggest($q);
# build new query
$new_query .= @suggestions[0] . ' ';
}
# add a suggestion to the output
print "Did you mean: $new_query?";
}
I use these techniques in a number of half-baked interfaces. Try
entering misspelled words:
http://infomotions.com/books/
http://infomotions.com/alex2/
http://dewey.library.nd.edu/morgan/microforms/
http://dewey.library.nd.edu/morgan/serials/
http://dewey.library.nd.edu/morgan/microforms/eighteenth/
Fun!
--
Eric Lease Morgan
Head, Digital Access and Information Architecture Department
University Libraries of Notre Dame
(574) 631-8604
Received on Thu Mar 18 15:15:58 2004