Skip to main content.
home | support | download

Back to List Archive

Re: [swish-e] faceted search feature in Swish3

From: Peter Karman <peter(at)not-real.peknet.com>
Date: Mon Oct 12 2009 - 19:41:06 GMT
Thomas den Braber wrote on 10/12/2009 02:18 PM:
>> my $facets = $results->facets_for('color')->sort_by_count;
> 
> That is the one I am looking for.
> 
> Can you say something about the extra performance/memory this facets
> search cost ?
> Especially if there are many facet values (> 10000) ?

Everything costs something. You can minimize the cost for the facet 
collection if you can reduce the number of total loops and move the 
evaluating code closer to the compiled language.

The FacetFinder in swish_xapian is in C++ so it is about as fast as it 
can be. You could use that as a benchmark when comparing the equivalent 
code in a language binding like perl.

You have to look at every match, or a representative sample. The xapian 
MatchDecider is optimized so that as it is doing the result set 
comparision (running the search), it also collects facets. So there is 
only one loop.

With Swish 2.x IIRC you would have to either run 2 searches, one with no 
limit to get the facets and another with a limit to see just the page of 
results you want, or, run 1 search and manage the paging yourself in 
your code. Either way, you have to do the facet collection *after* the 
search has been performed, so you effectively have two loops.

With Swish3 with Xapian it'll happen *while* the search is being 
performed, so should be somewhat faster.

The overhead will be more pronounced for big facet values if your 
calling code is in the native binding language rather than in C++ (the 
xapian core language). There's a lot of overhead spent crossing 
boundaries between the compiled library and the binding language (perl, 
php, etc).

> 
> If there are many facet values and I only need the top 10, are all facet
> values still loaded into the $facets array ?

Bear in mind I haven't written the code yet ;) , so we could have 
something like:

  $facets = $results->facets_for('color')->sort_by_count_limit(10);

But all the facets will be there in $results, and limiting the number 
returned is mostly a convenience, since the expensive part is in 
building the list to start with, not slicing it once you've got it. You 
don't know what the top 10 are till you have looked at a big enough sample.

-- 
Peter Karman  .  http://peknet.com/  .  peter(at)not-real.peknet.com
_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Mon Oct 12 15:41:11 2009