Skip to main content.
home | support | download

Back to List Archive

Re: [swish-e] exclude line of text from indexing

From: Rob Lingelbach <rob(at)not-real.colorist.org>
Date: Fri Oct 09 2009 - 19:42:52 GMT
An example might serve better to explain.

I want to search my mailman-generated message archives,
which consist of individual html files, and are indexed by
swish-e according to my swish configuration file, for a particular
string, let's say "Fuji TAF" .

Because that string is in the Subject: line of at least one file, there
will be two other files that contain the string: the file that
corresponds to the message archived immediately before, and the
message archived immediately after, according to mailman's
date, thread, subject, author, etc. indexing.   All I really need to do
is exclude from indexing the *lines* that start:

<LI>Previous message:  <some variable text regarding msg Subject>

--and--

<LI>Next message:  <some variable text regarding msg Subject>

.   Because otherwise any search for text that is in the Subject:
header of a message will return 3 matching files- the one desired,
and the one after (which will have the "<LI>Previous message:
<Subject text here>") and the one before  (which will have the line
"<LI>Next message: <Subject text here>").

Rob

On Oct 9, 2009, at 4:23 PM, Rob Lingelbach wrote:

>
> On Oct 9, 2009, at 4:16 PM, Peter Karman wrote:
>
>> Rob Lingelbach wrote on 10/09/2009 01:54 PM:
>>> I need to exclude from swish-e indexing lines such as:
>>>
>>> "Next message: <some text>"
>>>
>>> and
>>>
>>> "Previous message: <some text>"
>>>
>>
>> http://swish-e.org/devel/devel_docs/swish-
>> config.html#obeyrobotsnoindex
>
> Thanks for the answer Peter, but in this case perhaps I wasn't clear:
>
> every file- every message, some 100 thousand files or messages, has
> pipermail
> or mailman markup that includes the previous message's and the next
> message's
> Subject:  <text> .   What these lines have in common is the string at
> the head of
> the line such as:
>
> <LI>Previous message: <A HREF="016805.html">[Tig] DVNR 1000 4X4
> chasis  (etc.)......
> and
> <LI>Next message: <A HREF="016809.html">[Tig] Fuji TAF
>
> so you see, the lines can be found via their beginning text and not
> indexed by that,
> ---  it's not something I can do with meta tags per file, I don't  
> think?
>
> regards
> Rob
>
> --
> Rob Lingelbach
> rob@colorist.org
>
> _______________________________________________
> Users mailing list
> Users@lists.swish-e.org
> http://lists.swish-e.org/listinfo/users

--
Rob Lingelbach
rob@colorist.org

_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Fri Oct 9 15:43:36 2009