Bill Moseley wrote:
>On Tue, May 20, 2003 at 07:50:15AM -0700, Ivo Mans wrote:
>
>
>>I'm trying to index OpenOffice files (on a furthermore perfect working swish-e installation).
>>I've added following lines in my config:
>>
>>FileFilterMatch "/usr/bin/unzip" "-p \"%p\" content.xml" /\.(sxw|sxc|sxg)$/i
>>IndexContents XML* .sxw .sxc .sxg
>>StoreDescription XML <text> 20000
>>
>>
>
>Try StoreDescription XML* so it matches up.
>
Just tried. No change.
>>Resulting in error message:
>>Warning: XML parse error in file './QU030423im01.sxw' line 2. Error: not well-formed
>> (93 words)
>>
>>This goes for many or all of the OO-files on our network, created with recent OO-versions
>>(mostly the latest v.1.0.3.1). Looking manually to the unzipped result looks like a fine
>>XML-file to me, although too complex to be 100% sure.
>>
>>The unzipped content:
>>line 1: <?xml version="1.0" encoding="UTF-8"?>
>>line 2: All other data, including style definitions: can be extreme long line
>>
>>
>
>Where's the opening tag?
>
>
Here the opening tag (as said: original is all on 1 line):
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE office:document-content PUBLIC "-//OpenOffice.org//DTD
OfficeDocument 1.0//EN" "office.dtd">
<office:document-content
xmlns:office="http://openoffice.org/2000/office"
xmlns:style="http://openoffice.org/2000/style"
xmlns:text="http://openoffice.org/2000/text"
xmlns:table="http://openoffice.org/2000/table"
xmlns:draw="http://openoffice.org/2000/drawing"
xmlns:fo="http://www.w3.org/1999/XSL/Format"
xmlns:xlink="http://www.w3.org/1999/xlink"
xmlns:number="http://openoffice.org/2000/datastyle"
xmlns:svg="http://www.w3.org/2000/svg"
xmlns:chart="http://openoffice.org/2000/chart"
xmlns:dr3d="http://openoffice.org/2000/dr3d"
xmlns:math="http://www.w3.org/1998/Math/MathML"
xmlns:form="http://openoffice.org/2000/form"
xmlns:script="http://openoffice.org/2000/script" office:class="text"
office:version="1.0">
...
</office:document-content>
*********************************************************************
Due to deletion of content types excluded from this list by policy,
this multipart message was reduced to a single part, and from there
to a plain text message.
*********************************************************************
Received on Tue May 20 17:48:02 2003