Skip to main content.
home | support | download

Back to List Archive

Re: MIME Types of zipped files

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Thu Nov 10 2005 - 13:58:55 GMT
On Thu, Nov 10, 2005 at 05:18:55AM -0800, Lars D. Noodén wrote:
> On Thu, 10 Nov 2005, Bill Moseley wrote:
> > You can't have two XML declarations in the file.  Libxml2 will just
> > stop parsing.
> 
> My mistake.  Couldn't files have multiple declarations in SGML?

Even if you could just cat them together they probably are not that
useful in the general case.

I would think you would be happier with results if you created a new
empty xml document, walked meta.xml fetching the meta and dc tags
and rewrite them using simple tags (i.e. <printed> instead of
<meta:print-date>) and then grabbed all the content nodes from the
contents.xml file and placed them in <content>.

You could also get fancy and generate html -- the advantage there is
you can get some tags <title>, <h1>, <em> to rank a bit higher in
search results.  I suppose there's a way to use OO itself to open the
document and generate HTML.  That would be slow.

> I've been looking at the other filters, particularly Pdf2HTML.pm and 
> XLtoHTML.pm, but if XML can't handle more than one declaration per file, 
> then my intended approach won't work.

I just don't think blindly cat'ing the files together is the way to
go.


> Instead, could SWISH::Filter pass the file to multiple filters, with each 
> one getting passed to 'prog'[1] separately ?  One pass 
> could get the content, the second the metadata, etc.
>  	http://swish-e.org/docs/filter.html#writing_filters

Then you end up with duplicate files in search results.



-- 
Bill Moseley
moseley@hank.org

Unsubscribe from or help with the swish-e list: 
   http://swish-e.org/Discussion/

Help with Swish-e:
   http://swish-e.org/current/docs
   swish-e@sunsite.berkeley.edu
Received on Thu Nov 10 05:58:57 2005