Skip to main content.
home | support | download

Back to List Archive

Re: The swish-d cluster system is ready for beta

From: Peter Karman <peter(at)not-real.peknet.com>
Date: Wed Mar 16 2005 - 22:28:54 GMT
Dave Seff scribbled on 3/16/05 4:10 PM:
> On Wed, 2005-03-16 at 13:13 -0800, Bill Moseley wrote:
> 
>>>Each swishd ran the search and returned results to cluster_mgr. Then
>>>cluster_mgr sorted the results by rank to look something like the
>>>following and sends back to the origional client. Notice that the
>>>results are in reverse order by rank:
>>
>>Although the ranks coming from different indexes may not be related,
>>right?  The displayed rank is scaled based on the result set for a
>>given index.  You may have a result from one index with a rank of 900 that
>>seems less related than a result from another index that ranks 800.
>>Still, probably works fine in most cases.  I wonder if swish could give
>>you more of a raw ranking score and then have your manager scale the
>>ranks all at once.
>>
>>Very cool.  Now you have fodder for a publication!
>>
> 
> 
> Hmm . .  I wasn't aware that the ranking system differed between
> indexes. For now it will have to do until I can find a better way to
> sort results. 
> 
> 

just to make it a little more complicated, the IDF rank scheme relies on word 
count per index and those can make a big difference in ranking.

Depending on the kind of docs you're searching, however, rank may or may not 
make much difference. If, for example, you're searching small docs from a XML 
dump from a database, then ranking (of either scheme, IDF or the default) is 
fairly inconsequential in my experience. Sorting by properties will likely give 
you more useful results. It all depends on what you want to do with the data set.

-- 
Peter Karman  .  http://peknet.com/  .  peter(at)not-real.peknet.com
Received on Wed Mar 16 14:28:56 2005