Skip to main content.
home | support | download

Back to List Archive

Re: Parsing a hypermail archive to exclude headers and footers

From: David L Norris <dave(at)not-real.webaugur.com>
Date: Thu Oct 09 2003 - 19:12:32 GMT
On Thu, 2003-10-09 at 13:32, Kissman, Paul (BLC) wrote:
> I have a newbie question.
> 
> I have started to create hypermail archives of our majordomo lists in
> order to be able to search them via Swish-E.  (swish-e 2.2.3)

> I can't figure out if there is a way to have swish-e just index this
> part of the document or not.

You might want to look at index_hypermail.pl script included with
SWISH-E.

Also, below I've included the SWISH-E config I use to index my hypermail
archives with SWISH-E 2.4.  Maybe you can adapt it to your needs.



# Rewrite the files to play nice with our meta data
FileFilter .html /usr/bin/perl "-p -e 's@<!-- body=\"start\" -->@<!--
body=\"start\" --><div>@g;s@<!-- body=\"end\" -->@</div><!--
body=\"end\" -->@g;s@<pre>@<pre><div>@g;s@</pre>@</div></pre>@g' '%p'"
                                                                                                                          
FileRules filename regex /author\.html/
FileRules filename regex /index\.html/
FileRules filename regex /thread\.html/
FileRules filename regex /subject\.html/
                                                                                                                          
DefaultContents HTML2
IndexOnly .html
IndexContents HTML2 .html
                                                                                                                          
PropertyNames author subject date serial
PropertyNamesDate epoch
#PropertyNamesNumeric serial
                                                                                                                          
MetaNames swishtitle swishdescription author subject date epoch serial
PresortedIndex serial epoch
                                                                                                                          
StoreDescription HTML2 <div> 128
                                                                                                                          
IgnoreMetaTags <dl> <dt> <dd> <ul> <li> <strong>
                                                                                                                          
MetaNamesRank 1 author
MetaNamesRank 1 epoch


-- 
 David Norris
  http://www.webaugur.com/dave/
  ICQ - 412039
Received on Thu Oct 9 19:17:30 2003