On Thu, Jul 22, 2004 at 06:55:14AM -0700, Tac wrote:
> I remember reading somewhere that swish internally weighed swishtitle
> heavier than other fields, and h1 tags higher in html documents. Is there
> documentation on how to control this for XML files?
Kind of -- but it's a different system.
The weighting you describe above works by checking the "structure"
bits recorded for each word indexed. Swish stores a bit of data for
each and every word. It stores the word's position (for phrase
matching) and the structure byte flags where in an HTML document the
word was found (i.e. in <title>, or H1 or <body>). That data is only
for HTML documents (which is what swish-e was first designed to
index).
There's also a config option called MetaNamesRank and that's suppose
to allow adjustment of the rank based on metaname. That's listed in
http://swish-e.org/current/docs/SWISH-CONFIG.html
the docs say it isn't implemented, although it is in cvs (I thought
also in the last version of swish -- you could look at rank.c in your
own version). How well it works is up for debate. There's been a lot
of talk about improvements to the ranking code.
--
Bill Moseley
moseley@hank.org
Received on Thu Jul 22 07:32:36 2004