Skip to main content.
home | support | download

Back to List Archive

Re: Parsing a hypermail archive to exclude headers and footers

From: David L Norris <dave(at)not-real.webaugur.com>
Date: Thu Oct 09 2003 - 19:24:35 GMT
On Thu, 2003-10-09 at 14:07, David L Norris wrote:
> > I can't figure out if there is a way to have swish-e just index this
> > part of the document or not.
> 
> You might want to look at index_hypermail.pl script included with
> SWISH-E.
> 
> Also, below I've included the SWISH-E config I use to index my hypermail
> archives with SWISH-E 2.4.  Maybe you can adapt it to your needs.
> 

Oh, also, my Hypermail mhtmlheaderfile includes hypermail-generated
metadata:

<html>
   <head>
      <title>%s - %l</title>
      %A
      %S
      %D
      <meta name="serial" content="%i">
      <link rel="stylesheet" type="text/css" href="/_indexer/index.css">
   </head>
<body>


The IgnoreMetaTags cause searches to ignore those HTML elements.  And,
at least my version of, hypermail places all the info you don't want to
see inside these elements:                          
> IgnoreMetaTags <dl> <dt> <dd> <ul> <li> <strong>

A regular search will only look in the body of the message (I think). 
But you can perform searches like author="David Norris" to look at just
the author metadata or whatever.  author, subject, description, etc. 
description contains the first so many characters of the email.

-- 
 David Norris
  http://www.webaugur.com/dave/
  ICQ - 412039
Received on Thu Oct 9 19:28:31 2003