Skip to main content.
home | support | download

Back to List Archive

Strange index ranking problem remedied.

From: Scott McDaid <scott(at)not-real.ednet.co.uk>
Date: Fri Mar 09 2001 - 10:14:35 GMT
I discovered a page ranking issue with the indexer that was giving strange
results on a site I was indexing recently.

On the site, there were numerous pages that each had the same drop down
menu with the staffs' names. However, on each individual page, that
particular staff member would have their name another 2 or three times
say.

The problem I had was that when you searched for a persons name, their
were 40 or so results coming back, most with a rank of 1000. The actual
person's page was way down the list. Ideally this would come top, as you
would expect with a rank of 1000, and the others having a lesser rank.

I found this bit of code in index.c - the offending bit of code commented
out. Basically, if a word appears less that 5 times on a page, then it's
automatically set to frequency 5! Hence the reason why all the pages were
coming up with rank 1000.

/* Taken out by Scott (edNET) - 27/2/2001.
        if (freq < 5)
                freq = 5;
 */
        d = 1.0 / (double) tfreq;
        e = (log((double) freq) + 10.0) * d;
        if (!ignoreTotalWordCountWhenRanking)
        {
                e /= words;
        }
        else
        {
                /* scale the rank down a bit. a larger has the effect of
                   making small differences in work frequency wash out */
                e /= 100;
        }
        f = e * 10000.0;

re-compiled, re-indexed, and all worked fine :)

Sorry if this has been posted before.

Regards

Scott.

-- 
Scott McDaid
edNET
t: +44 131 625 5557 (direct dial)
t: +44 131 466 7003 (office)
Received on Fri Mar 9 10:20:19 2001