Peter,
The fixes that you made appear to have corrected the issue, at least for
the phrases that were reported to me. I am going to push the index to our
dev servers and let the sales people and somce clients test on it next
week, but I think we are all set.
Some thoughts on the size thing, the initial report was that Corey Rich
couldn't be searched, and so my limited test was only with Corey Rich
files. So it may be possible that it was the addition of other records
that caused things? Also, no other photographers with the name Corey could
be searched by their full (first plus last) names. So maybe it worked
until another person named Corey was added?
I can do more testing of this, but since it is working, I don't know if it
is needed, let me know if you want me too.
Brad
---------------------
Brad Miele
VP Technology
IPNStock.com
866 476 7862 x902
bmiele@ipnstock.com
On Fri, 10 Nov 2006, Peter Karman wrote:
>
>
> brad miele scribbled on 11/10/06 3:22 PM:
>> yes, it seems that the volume of files/words was a factor, since it
>> didn't/doesn't crop up with smaller sets.
>>
>> this test was on the full set, so i am sort of baffled by why that change
>> would make the difference.
>>
>> i guess i should keep looking for a more real solution. the stemmer_en1
>> doesn't seem to do as good of a job (at least according to our
>> salespeople), and we can't seem to make the jump to 2.4.4 with en2
>>
> > i find that when i remove the two
>>>> references at the top to:
>>>>
>>>> { FUZZY_STEMMING_EN2, "Stemming_en", Stem_snowball,
>>>> porter_create_env, porter_close_env, porter_stem },
>>>> { FUZZY_STEMMING_EN2, "Stem", Stem_snowball,
>>>> porter_create_env, porter_close_env, porter_stem },
>>> That's just a mapping table -- it maps the config names ("None",
>>> "Stemming_en", etc.) to the code for that stemmer.
>>>
>>> The difference between 2.4.3 and 2.4.4 is that we removed the old
>>> Porter stemmer so Stem and Stemming_en were changed to use the new
>>> snowball stemmer code instead of the old Porter code.
>>>
>
> I took a look at the diffs from 2.4.3 through 2.4.4. Looks like there were a
> couple changes: one where I took out the Stemming_en and Stem options, and
> another when I put them back in with a warning.
>
> The difference when I put them back in however was that instead of being
> FUZZY_STEMMING_EN they were changed to FUZZY_STEMMING_EN2. FUZZY_STEMMING_EN was
> dropped from stemmer.h at the same time.
>
> To make matters more confusing, the error message indicates that the deprecated
> features Stemming_en and Stem will use Stemmer_en1 -- but they are marked with
> FUZZY_STEMMING_EN2 even though they call the same init/free functions as
> Stemmer_en1.
>
> So, there's definitely something suspicious in stemmer.c I think. I'm going to
> commit a change to CVS -- Brad, would you take a look at the CVS version and see
> if that works any better?
>
> And here's a little script to test all the stemmers. Use it like:
>
> perl stemtest.pl wordIwant2stem
>
> and it will show how each stemmer handles wordIwant2stem. Note that the
> SWISH::API 0.04 is required for a working Fuzzify() method.
>
> ------------------------------8<snip--------------------------
> #!/usr/bin/perl
> #
> # test the Swish-e stemmers
> #
> #
> use strict;
> use warnings;
> use SWISH::API; # requires 0.04 or later for working Fuzzify()
>
> my $usage = "$0 word2stem";
> my $html = 'stem_test.html';
> my $word = shift @ARGV or die $usage;
>
> unless (-s $html)
> {
> open(S, ">$html") or die "can't write $html: $!";
> print S '<html>some words here that do not matter</html>';
> close(S);
> }
>
> my @warm_fuzzies = qw(
> Stemming_en
> Stem
> None
> Soundex
> Metaphone
> DoubleMetaphone
> Stemming_es
> Stemming_fr
> Stemming_it
> Stemming_pt
> Stemming_de
> Stemming_nl
> Stemming_en1
> Stemming_en2
> Stemming_no
> Stemming_se
> Stemming_dk
> Stemming_ru
> Stemming_fi
> );
>
> for my $f (@warm_fuzzies)
> {
> my $index = i($f);
> my $swish = SWISH::API->new($index);
> my $fuzzy = $swish->Fuzzify($index, $word);
> print "$f -> " . join(' ', $fuzzy->word_list) . "\n";
> }
>
> sub i
> {
> my $f = shift;
> my $index = "$f.index";
> return $index if -s $index; # don't create more than once.
> system("echo 'FuzzyIndexingMode $f' > config");
> system("swish-e -i $html -c config -f $index 1>/dev/null");
> return $index;
> }
> ------------------------------8<snip--------------------------
>
>
>
> --
> Peter Karman . http://peknet.com/ . peter(at)not-real.peknet.com
>
>
Received on Sat Nov 11 11:31:16 2006