On Tue, May 11, 2004 at 12:24:35PM -0700, Chris Kantarjiev wrote:
> I'm indexing a mail archive (one file per message) and searching
> with swish.cgi. (I'm running 2.4.1.) It was recently pointed
> out to me that "Subject & Body" searches don't find all the
> messages that "Subject" does - that is, if the keyword only
> appears in the subject field, which becomes swishtitle, it
> isn't found by Subject & Body.
That's due to the way your program converts the mail messages.
See, when swish-e indexes a HTML document it indexes the <title> text
under the "swishdefault" meta name (and flags the words as being in the
title so they rank higher).
So a search like:
swish-e -w foo
will find foo in the title as well as in the body.
> metanames => [qw/swishdefault swishtitle from all/],
> name_labels => {
> swishrank => 'Rank',
> all => 'Entire message',
> swishtitle => 'Subject Only',
> from => "Poster's Email",
> date => 'Message Date',
> swishdefault => 'Subject & Body',
> },
And there you are saying search "swishdefault" for "Subject & Body."
But...
> <title>
>
> </title>
> <meta name="precedence" content="list">
> <meta name="swishtitle" content="Girls Aloud's year at the top">
> <meta name="to" content="Name <your@name.here>">
> <meta name="sender" content="your@name.here">
> <meta name="date" content="1066685834">
> <meta name="from" content="Another Name <my@name.here>">
> <meta name="received" content="by wolfe.bbn.com (Postfix, from userid 13274)">
> </head><body>
There you say to index the title (Girls Aloud's year at the top) just
under the metaname swishtitle.
Using -T indexed_words will show you the difference and why it's not
working like you want.
Searching "-w foo" or "-w swishdefault=foo" doesn't also search
swishtitle, it only searches swishdefault. It just happens that <title>
text gets indexed as swishdefault along with <body> text.
> <http://news.bbc.co.uk/1/low/entertainment/tv_and_radio/3207926.stm>
> <http://news.bbc.co.uk/1/low/england/3207822.stm>
You might want to HTML escape those so they don't look like tags -- that
is, if you want to index the words in those links.
--
Bill Moseley
moseley@hank.org
Received on Tue May 11 12:52:07 2004