On Thu, Nov 20, 2003 at 11:36:19AM -0800, Roy Tennant wrote:
> We're using swish-e to search a collection of books and I've discovered
> an odd thing with relevance that I'm trying to puzzle out. The
> following search on "freud":
>
> http://texts.cdlib.org/cgi/searchallbooks.pl?search=freud&mode=book+text&sort=relevance
>
> returns a list of books, with the 439th result being the book "Freud
> and His Critics". That book is rife with the word, and yet it is ranked
> very low. I have verified that it is not my CGI that is doing anything
> funny, as a command line search provides the same results. Why is that
> particular book ranked so low, when it has something on the order of
> almost twice as many occurrences of the word "freud" in it as the
> top-ranked book?
Hi Roy,
Can't really tell without looking at the source documents and how you
are indexing. You can set a define RAW_RANK when compiling to prevent
swish from scaling the output, but that's probably not enough detail.
There's also a DEBUG_RANK define that can be set to dump info about the
ranking while searching. See rank.c for details.
I know I've talked about it a lot, but ranking still needs a major
overhaul.
--
Bill Moseley
moseley@hank.org
Received on Thu Nov 20 22:19:25 2003