On 11/29/2007 09:57 AM, Tomasz Chmielewski wrote:
>
> swish-e parses all Mailman archive just fine when I use "IndexContents
> HTML .html"; if I add an asterisk (* - "IndexContents HTML* .html"), it
> reports these errors.
>
that's because libxml2 (HTML*) is a stricter parser. the older expat parser (HTML) is not.
that's also likely why you don't get SwishDescription to work with libxml2 (HTML*),
because the parsing is failing and stopping at that point.
I'd suggest either using HTML all the way through instead of HTML*, or running your
mailman stuff through libtidy before indexing it with HTML*.
--
Peter Karman . peter(at)not-real.peknet.com . http://peknet.com/
_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Thu Nov 29 11:16:36 2007