Bill Moseley schrieb:
> On Wed, Nov 28, 2007 at 11:47:12PM +0100, Tomasz Chmielewski wrote:
>> Peter Karman schrieb:
>>> Tomasz Chmielewski wrote on 11/26/07 2:39 AM:
>>>
>>>> Without the asterisk (*), it works just fine for me.
>>>> What difference does it make?
>>>>
>>> if you have libxml2 installed, then XML* == XML2
>>> if not, then XML* == XML
>>>
>>> So if you have compiled with libxml2, and you specify one directive as XML and
>>> the other as XML*, then you are using 2 different parsers.
>> Hmm, with the asterisk (*), like below, it doesn't work for me:
>>
>> IndexContents HTML* .html
>>
>>
>> # swish-e -c mysite.conf
>>
>> Indexing Data Source: "File-System"
>> Indexing "/srv/www/vhosts/wpkg.org/mailman/archives/public"
>> /srv/www/vhosts/wpkg.org/mailman/archives/public/wpkg-announce/2005-December/000000.html:7:
>> error: htmlParseEntityRef: expecting ';'
>> s.wpkg.org?Subject=%5BWpkg-announce%5D%20wpkg-0.9.2-test1%20released&In-Reply-To
>>
>> ^
>
> Are you correctly escaping you URLs? & should be &
>
> Run your page through a html validator.
It's generated by Mailman, so I can't do much about it. Mailman
generates "almost" valid HTML though.
The URL in question is:
<A
HREF="mailto:blah-announce%40lists.wpkg.org?Subject=%5BWpkg-announce%5D%20blah%20released&In-Reply-To="
TITLE="subject">blah at wpkg.org
</A>
swish-e parses all Mailman archive just fine when I use "IndexContents
HTML .html"; if I add an asterisk (* - "IndexContents HTML* .html"), it
reports these errors.
--
Tomasz Chmielewski
http://wpkg.org
_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Thu Nov 29 10:57:43 2007