On Mon, Feb 05, 2007 at 12:25:24AM -0800, Jordan Hayes wrote:
> I upgraded to 2.4.5 (from 2.4.3) today, but none of my Mailman archives
> will index anymore.
>
> I've narrowed it down to this:
>
> ./001946.html:3: error: htmlParseEntityRef: expecting ';'
> <A HREF="mailto:joe%40blow.org?Subject=Foo&In-Reply-To=">Hi</A>
> ^
Yes, I've sen that. It's an invalid entity according to libxml2.
Are you sure it's what is causing you indexing to fail? The error is
reported, but parsing continues.
$ cat test.html
<html>
<head>
<title>hello</title>
</head>
<body>
here is some text <A HREF="mailto:joe%40blow.org?Subject=Foo&In-Reply-To=">Hi</A>
other text
</body>
</html>
$ swish-e -i test.html -T indexed_words
Indexing Data Source: "File-System"
Indexing "test.html"
Adding:[1:swishdefault(1)] 'hello' Pos:5 Stuct:0x7 ( HEAD TITLE FILE )
test.html:6: error: htmlParseEntityRef: expecting ';'
here is some text <A HREF="mailto:joe%40blow.org?Subject=Foo&In-Reply-To=">Hi</
^
Adding:[1:swishdefault(1)] 'here' Pos:11 Stuct:0x9 ( BODY FILE )
Adding:[1:swishdefault(1)] 'is' Pos:12 Stuct:0x9 ( BODY FILE )
Adding:[1:swishdefault(1)] 'some' Pos:13 Stuct:0x9 ( BODY FILE )
Adding:[1:swishdefault(1)] 'text' Pos:14 Stuct:0x9 ( BODY FILE )
Adding:[1:swishdefault(1)] 'hi' Pos:15 Stuct:0x9 ( BODY FILE )
Adding:[1:swishdefault(1)] 'other' Pos:16 Stuct:0x9 ( BODY FILE )
Adding:[1:swishdefault(1)] 'text' Pos:17 Stuct:0x9 ( BODY FILE )
Removing very common words...
no words removed.
Writing main index...
Sorting words ...
Sorting 7 words alphabetically
Writing header ...
Writing index entries ...
Writing word text: Complete
Writing word hash: Complete
Writing word data: Complete
7 unique words indexed.
4 properties sorted.
1 file indexed. 161 total bytes. 8 total words.
Elapsed time: 00:00:00 CPU time: 00:00:00
Indexing done!
--
Bill Moseley
moseley@hank.org
Unsubscribe from or help with the swish-e list:
http://swish-e.org/Discussion/
Help with Swish-e:
http://swish-e.org/current/docs
_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Mon Feb 5 06:27:57 2007