On Wed, Aug 04, 2004 at 04:08:09AM -0700, Peter Karman wrote:
> For example, if the word 'the' appears in 98% of the docs in your index,
> it will have an IDF of 1. If the word 'foo' appears in 10% of your docs,
> it will have an IDF of something greater than 1 (something like 5 or 6,
> depending on the math, number of docs, etc.). So for a query of 'the
> foo', docs with more instances of 'foo' will rank relatively higher than
> docs with fewer instances of 'foo', while instances of 'the' will affect
> ranking much the same way they do now (that is to say, not much).
Does this effect this config option?
IgnoreTotalWordCountWhenRanking
> IDF has a similar effect to IgnoreLimit or StopWords, but on a smoother
> scale. A word isn't just in or out (a StopWord or not), but rather has a
> relative weight compared to all the other word in the index.
But, that's not implemented, right? So is the idea that stopwords
just have a much lower score?
>
> I have several other new ranking features in the works, but wanted to
> get some feedback for this one before I move ahead too much in this
> direction. Other features might include:
>
> normalizing weight for word density/document length
> scaling the IDF to allow for greater granularity in difference
> weighting words based on their proximity to other query words
That last one would be nice -- if that worked well then the default
search might be "OR", but the "ANDed" results get ranked much higher.
--
Bill Moseley
moseley@hank.org
Unsubscribe from or help with the swish-e list:
http://swish-e.org/Discussion/
Help with Swish-e:
http://swish-e.org/current/docs
swish-e@sunsite.berkeley.edu
Received on Wed Aug 4 11:04:12 2004