Skip to main content.
home | support | download

Back to List Archive

Re: Title matches on result top

From: Ander <redna(at)not-real.euskalerria.org>
Date: Tue Mar 09 2004 - 15:20:32 GMT
I think there must be good rank algorithms or another similar works done. Would it be a hard work to choose which algorithm
would be more suitable and implement it? ;-) I would be ready to work, but I'am not a 'code master'.
Would anyone else be interested??

>
>Subject: [SWISH-E] Re: Title matches on result top
>   From: Bill Moseley <moseley@hank.org>
>   Date: Tue, 9 Mar 2004 05:46:14 -0800 (PST)
>     To: Multiple recipients of list <swish-e@sunsite.berkeley.edu>
>
>On Tue, Mar 09, 2004 at 12:38:17AM -0800, redna@euskalerria.org wrote:
>
>> >You could try tweaking those, but the other problem is that swish
>> >considers to some degree the number of hits in a file, so a large file
>> >may out-rank a smaller file with the word in the title.
>> 
>> Does not swish-e convert frequencys into percents?? Would it be a bad idea?
>
>You should look at rank.c.  That and the query processing are 
>long-standing problems that need attention.  Ranking is very basic 
>currently.
>
>There's a mode to consider the length of the document in the rank
>calculations but when I tested the feature it didn't seem to make much
>difference in the ranking -- and in some cases made it worse.
>
>It's subjective, of course.  What I did was index a few small (< 10,000 
>pages) sites and then compare search results with google.  I spent a day 
>playing with small tweaks to rank.c and it was clear that very large 
>files throw off the rank.  One true hack was to limit the number of 
>word hits per document and that one thing alone made the results match 
>more like how google ranked.  I just limited the frequency count to 100.  
>How's that for an ugly hack?
>
>I had also tried limiting the counts to the first X word positions but 
>with less of an effect.  I was expecting that to have more of an effect.
>If you are looking for a document about something you might think that 
>it would be discussed early on in the document.
>
>Swish-e has been used for indexing reasonably small sets of documents,
>so effective searching is often as helpful as is the ranking.  Still, I
>hope someone comes along that knows something about ranking and has some
>time that can update swish-e's code.
> 
>
>-- 
>Bill Moseley
>moseley@hank.org
Received on Tue Mar 9 07:20:35 2004