Skip to main content.
home | support | download

Back to List Archive

Re: no hits with soundex

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Tue Aug 20 2002 - 13:52:32 GMT
On Tue, 20 Aug 2002, David Hoare wrote:

> I have two configuration files which differ only in the switching on of 
> UseSoundex. I have set up MetaNames "year" and "athr" which are extracted 
> from the file path via

My guess is that you will find soundex too fuzzy.  We discussed changing
to or adding support for metaphone matching, but nothing has happend
there yet.  I doubt it would be that hard to add.

> MetaNames year athr

(doesn't hurt, but ExtractPath will add the name to the list of MetaNames,
so you don't really need the above.)

> PropertyNames year athr
> ExtractPath year regex !^/.*/[a-z]/([0-9]+)/.+$!$1!
> ExtractPath athr regex !^/.+/(authors).html$!$1!
> 
> 
> When I search with the nosoundex index for this search I get lots of hits 
> 
> ./bin/swish-e -P ^ -f ./indexfiles/nosoundex.index -w "(smith) athr=authors  and  year=(199* or 2000 or 2001 or 2002)"
> 
> If I do the same search with the soundex file I get no hits 

That's a bug, I suppose.  I didn't look at the code, but by using -T I can
see that numbers are converted into an empty string by soundex.  The way
the stemming module works is if it can't stem a word it returns the
original word.  Soundex should do the same thing.

When trying to debug I create a single test file and do something like:

  ./swish-e -c conf -i test.html -T \
        parsed_words \   ( show words as parsed from source )
        indexed_words \  ( show words as added to index )
        properties       ( show props added for each file )

  -T help will show all available trace (debugging) options


-- 
Bill Moseley moseley@hank.org
Received on Tue Aug 20 13:56:02 2002