Re: [swish-e] (null) problem - yes, I've read the FAQ

From: Peter Karman <peter(at)>
Date: Thu Nov 29 2007 - 16:16:36 GMT
On 11/29/2007 09:57 AM, Tomasz Chmielewski wrote:

> swish-e parses all Mailman archive just fine when I use "IndexContents 
> HTML .html"; if I add an asterisk (* - "IndexContents HTML* .html"), it 
> reports these errors.

that's because libxml2 (HTML*) is a stricter parser. the older expat parser (HTML) is not.

that's also likely why you don't get SwishDescription to work with libxml2 (HTML*),
because the parsing is failing and stopping at that point.

I'd suggest either using HTML all the way through instead of HTML*, or running your
mailman stuff through libtidy before indexing it with HTML*.

