Skip to main content.
home | support | download

Back to List Archive

Re: Combining stem/non stem removing dups in perl

From: Brad Miele <brad(at)not-real.auroraquanta.com>
Date: Wed Nov 03 2004 - 23:00:11 GMT
Peter,

pretty much my approach, my only desire would be to get an accurate total
hits back. the way it is now, i have to either bring the entire result set
in, uniq it, get the count, and then loop out the rows that i
want. which i am feering for the size, but maybe it isn't as much of
an issue as i think.

along these lines, and since you were the sucker, err kind soul who
responded first, do you know if there is a way to force a meta for every
record based on config?

basically, if i am indexing /xmldocs, once with stem and once without, i

would like to set a sort value of 0 for the non-stemmed and 1 for the
stemmed, so when i sorted the results, the stemmed would get pushed to the
end of the set. I am thinking that i can do it as a meta that i ignore in
one of the confs, but i would rather not have it in the file... possible
use a file attribute that i omit from one index?

disclaimer: i realize that i should just offer my users the choice, but i
have been informed that it would be "tooo confusing".

Brad
------------------------------------------------------------
 Brad Miele
 Technology Director
 AuroraPhotos.com
 (207) 828-8787 x110
 bmiele@auroraphotos.com

 During the next two hours, the system will be going up and down several
times, often with lin~po_~{po       ~poz~ppo\~{ o n~po_~{o[po	 ~y oodsou>#w4k**n~po_~{ol;lkld;f;g;dd;po\~{o


On Wed, 3 Nov 2004, Peter Karman wrote:

> I do something similar and found that your approach (hash keys) was the
> simplest. I use a counter so that I know when I've hit the appropriate
> number of 'hits' based on the initial range I was looking for.
>
> e.g. (CODE UNTESTED AND UNVERIFIED)
>
> # create the query
> # create api object
> # search
>
> my $hitsIwant = 20;
> my %uniq;
>
> while (my $result = $swish->NextResult)
> {
>
> 	my $prop = $result->Property( 'key' );
> 	$uniq{$prop}++;
> 	last if scalar( keys %uniq ) == $hitsIwant;
>
> }
>
> Brad Miele wrote on 11/3/04 3:33 PM:
>
> > Hi,
> >
> > I am trying to run a query against two indexes that contain the same set
> > of records. The first index is indexed without stemming, the next is. The
> > goal was to have exact matches come up at the start of the results and
> > then the stemmed ones towards the end. i am sorting the stemmed records to
> > the end of the list using a value added during the indexing.
> >
> > The problem is that since the indexes contain the same records, i want to
> > remove the duplicates from the results. I can do it using hash keys during
> > the ->NextResults phase of the search, but it slows things down and then
> > the number returned by Hits is off.
> >
> > Has anyone implimented something along these lines?
> >
> > Brad
> > ------------------------------------------------------------
> >  Brad Miele
> >  Technology Director
> >  AuroraPhotos.com
> >  (207) 828-8787 x110
> >  bmiele@auroraphotos.com
> >
> >  panic: can't find /
> >
>
> --
> Peter Karman  .  http://www.cray.com/craydoc/ .  karman(at)not-real.cray.com
>
Received on Wed Nov 3 15:00:11 2004