Skip to main content.
home | support | download

Back to List Archive

Re: Draft of OpenDocument filter

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Thu Nov 17 2005 - 12:52:47 GMT
On Thu, Nov 17, 2005 at 04:36:05AM -0800, Lars D. Noodén wrote:
> The mimetypes you list are not for OpenDocument but the immediate 
> predecessor.  They're close and the module should also work with them, but 
> technically it's a different format.  I'll try to make sure the module 
> works with them, too.

Ah, ok.  Those are what are set on my machine for use with OpenOffice.

> 
> Archive::Zip sounds like a good idea, but I had wanted to limit the number 
> of additional modules needed and the Pdf2html filter was my model.

Archive::Zip is available as a Windows perl package, which might be
easier for Windows users to install than the binary.

    http://ppm.activestate.com/BuildStatus/5.8.html

What I'd probably do is use Archive::Zip if available otherwise fall
back to unzip.

The pdf filter uses the binary because there's is no library and
associated perl module.  I discussed creating a library and module
with the author, but he was not ready to allow that at the time, IIRC.

> There is actually someone already working on an OpenDocument to XHTML 
> conversion using XSLT:
>  	http://books.evc-cit.info/odf_utils/odt_to_xhtml.html
> 
> Converting to XHTML or using the XML parser to extract or rewrite certain 
> fields seems a lot more work than using aliases in the swish config file 
> to map the tag names.  However, mapping means that the config file has to 
> be set up correctly.

Probably should be two filters.  One for just creating an html-like
document and one like yours for people that want the raw document for
more control when indexing.

Thanks,

-- 
Bill Moseley
moseley@hank.org

Unsubscribe from or help with the swish-e list: 
   http://swish-e.org/Discussion/

Help with Swish-e:
   http://swish-e.org/current/docs
   swish-e@sunsite.berkeley.edu
Received on Thu Nov 17 04:52:48 2005