Skip to main content.
home | support | download

Back to List Archive

Re: ranking change

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Wed Aug 04 2004 - 18:03:57 GMT
On Wed, Aug 04, 2004 at 04:08:09AM -0700, Peter Karman wrote:
> For example, if the word 'the' appears in 98% of the docs in your index, 
> it will have an IDF of 1. If the word 'foo' appears in 10% of your docs, 
> it will have an IDF of something greater than 1 (something like 5 or 6, 
> depending on the math, number of docs, etc.). So for a query of 'the 
> foo', docs with more instances of 'foo' will rank relatively higher than 
> docs with fewer instances of 'foo', while instances of 'the' will affect 
> ranking much the same way they do now (that is to say, not much).

Does this effect this config option?

   IgnoreTotalWordCountWhenRanking

> IDF has a similar effect to IgnoreLimit or StopWords, but on a smoother 
> scale. A word isn't just in or out (a StopWord or not), but rather has a 
> relative weight compared to all the other word in the index.

But, that's not implemented, right?  So is the idea that stopwords
just have a much lower score?


> 
> I have several other new ranking features in the works, but wanted to 
> get some feedback for this one before I move ahead too much in this 
> direction. Other features might include:
> 
> 	normalizing weight for word density/document length
> 	scaling the IDF to allow for greater granularity in difference
> 	weighting words based on their proximity to other query words

That last one would be nice -- if that worked well then the default
search might be "OR", but the "ANDed" results get ranked much higher.

-- 
Bill Moseley
moseley@hank.org

Unsubscribe from or help with the swish-e list: 
   http://swish-e.org/Discussion/

Help with Swish-e:
   http://swish-e.org/current/docs
   swish-e@sunsite.berkeley.edu
Received on Wed Aug 4 11:04:12 2004