Skip to main content.
home | support | download

Back to List Archive

Re: Problem indexing OpenOffice files

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Tue May 20 2003 - 15:50:33 GMT
On Tue, May 20, 2003 at 07:50:15AM -0700, Ivo Mans wrote:
> I'm trying to index OpenOffice files (on a furthermore perfect working swish-e installation).
> I've added following lines in my config:
> 
> FileFilterMatch "/usr/bin/unzip" "-p \"%p\" content.xml" /\.(sxw|sxc|sxg)$/i
> IndexContents XML* .sxw .sxc .sxg
> StoreDescription XML <text> 20000

That's confusing.

XML is one parser based on expat
XML2 is another parser based on libxml2

XML* says use the libxml2 parser if available, but fallback to expat otherwise.

So IndexContents XML* is really XML2 if you have libxml2 installed, but you are
using StoreDescription XML.  Try StoreDescription XML* so it matches up.

It's confusing, yes.

> Resulting in error message:
> Warning: XML parse error in file './QU030423im01.sxw' line 2.  Error: not well-formed
>  (93 words)
> 
> This goes for many or all of the OO-files on our network, created with recent OO-versions
> (mostly the latest v.1.0.3.1). Looking manually to the unzipped result looks like a fine
> XML-file to me, although too complex to be 100% sure.
> 
> The unzipped content:
> line 1: <?xml version="1.0" encoding="UTF-8"?>
> line 2: All other data, including style definitions: can be extreme long line

Where's the opening tag?

<?xml version="1.0" encoding="UTF-8"?>
<foo>
   ....
</foo>


-- 
Bill Moseley
moseley@hank.org
Received on Tue May 20 15:50:35 2003