Skip to main content.
home | support | download

Back to List Archive

Re: [swish-e] (null) problem - yes, I've read the FAQ

From: Peter Karman <peter(at)not-real.peknet.com>
Date: Thu Nov 29 2007 - 16:16:36 GMT
On 11/29/2007 09:57 AM, Tomasz Chmielewski wrote:

> 
> swish-e parses all Mailman archive just fine when I use "IndexContents 
> HTML .html"; if I add an asterisk (* - "IndexContents HTML* .html"), it 
> reports these errors.
> 


that's because libxml2 (HTML*) is a stricter parser. the older expat parser (HTML) is not.

that's also likely why you don't get SwishDescription to work with libxml2 (HTML*),
because the parsing is failing and stopping at that point.

I'd suggest either using HTML all the way through instead of HTML*, or running your
mailman stuff through libtidy before indexing it with HTML*.

-- 
Peter Karman  .  peter(at)not-real.peknet.com  .  http://peknet.com/

_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Thu Nov 29 11:16:36 2007