Hi,
I've been using swish-e to index my email for some time now,
but without paying any attention to the binary MIME
attachments that the messages contain. Now I would like to
index the MS-Word and PDF attachments as well.
I've written a perl script that uses MIME attachment
processing code from CPAN to extract the attachments and hand
them off to SWISH::Filter for filtering. So far, so good; all
the parts are processed properly and swish-e inputs are
produced.
However, each MIME message will produce at least two
parts, with different content types. The email text will be
text/plain, but perhaps the filtered PDF will be HTML. So one
email would produce multiple outputs, something like:
Path-Name: ./1964
Content-Length: 1001
Last-Mtime: 1092715695
Document-Type: TXT*
text text text ...
Path-Name: ./1964
Content-Length: 49099
Last-Mtime: 1092753193
Document-Type: HTML*
<html> ..... </html>
Can swish-e handle this? Two separate inputs for the same
file? Can those outputs be of different content types? I
suppose the laternative is to attempt to convert everything to
text/plain, combine content lengths, and feed swish-e just one
input per file.
Thanks,
Andy
--
Andy Jacobson
andyj@aos.princeton.edu
Program in Atmospheric and Oceanic Sciences
Sayre Hall, Forrestal Campus
Princeton University
PO Box CN710 Princeton, NJ 08544-0710 USA
Tel: 609/258-5260 Fax: 609/258-2850
Received on Tue Aug 17 07:42:18 2004