Skip to main content.
home | support | download

Back to List Archive

Re: Parsing a hypermail archive to exclude headers and footers

From: David L Norris <dave(at)>
Date: Thu Oct 09 2003 - 19:24:35 GMT
On Thu, 2003-10-09 at 14:07, David L Norris wrote:
> > I can't figure out if there is a way to have swish-e just index this
> > part of the document or not.
> You might want to look at script included with
> Also, below I've included the SWISH-E config I use to index my hypermail
> archives with SWISH-E 2.4.  Maybe you can adapt it to your needs.

Oh, also, my Hypermail mhtmlheaderfile includes hypermail-generated

      <title>%s - %l</title>
      <meta name="serial" content="%i">
      <link rel="stylesheet" type="text/css" href="/_indexer/index.css">

The IgnoreMetaTags cause searches to ignore those HTML elements.  And,
at least my version of, hypermail places all the info you don't want to
see inside these elements:                          
> IgnoreMetaTags <dl> <dt> <dd> <ul> <li> <strong>

A regular search will only look in the body of the message (I think). 
But you can perform searches like author="David Norris" to look at just
the author metadata or whatever.  author, subject, description, etc. 
description contains the first so many characters of the email.

 David Norris
  ICQ - 412039
Received on Thu Oct 9 19:28:31 2003