To follow up:
I upgraded to libxml2-2.5.7 (that's the last version
that will build on my RH6.1 production machine*), and
I still get the same errors during indexing:
http://www.gnu.org/testimonials/testimonials.ca.html
input conversion failed due to input error
Bytes: 0xC4 0x3C 0x2F 0x41
[In my last email I indicated that the file
http://www.gnu.org/testimonials/supported.html had the
above problem. I believe that the stdout/stderr output
got output in the wrong order that time, and it was
the above file which caused the error.]
* libxml2 Versions after 2.5.7 (including 2.6.0) fail
building with the error
nanoftp.c: In function `xmlNanoFTPGetConnection':
nanoftp.c:1556: structure has no member named
`ss_family'
..
Thought the SWISH-E developers would want to know.
Best,
jrobinson
--- J Robinson <jrobinson852@yahoo.com> wrote:
>
> --- Bill Moseley <moseley@hank.org> wrote:
> > On Sun, Oct 26, 2003 at 09:46:14AM -0800, J
> Robinson
> > wrote:
> > > It seems that Korean, japanese, and other asian
> > pages
> > > are especially likely to cause the error (no
> > surprise
> > > there). I found some publicly available
> examples:
> > >
> > > http://www.openbsd.com/ko/donations.html
> > > input conversion failed due to input error
> > > Bytes: 0xB8 0x00 0x20 0xBE
> > >
> > > But even some 'english' pages exhibit the error:
> > >
> > > http://www.gnu.org/testimonials/supported.html
> > > input conversion failed due to input error
> > > Bytes: 0xC4 0x3C 0x2F 0x41
> >
> > moseley@bumby:~$ od -t x1 supported.html | grep
> -i
> > c4
> > moseley@bumby:~$
> >
> > What version of libxm2 do you have? I don't see
> > that error.
> >
> > moseley@bumby:~$ xml2-config --version
> > 2.5.11
> >
> > I don't get the errors even with
> > http://www.openbsd.com/ko/donations.html. If I
> set
> > ParserWarnLevel I do
> > get a lot of
> >
> > warning: Failed to convert internal UTF-8 to
> > Latin-1.
> > Replacing non ISO-8859-1 char with char ' '
> >
>
> I'm using 2.4.28, built from source on RH6.1:
> % xml2-config --version
> 2.4.28
>
> I'll try upgrading libxml2.
>
> Still, it would be cool if SWISH-E did show the URI
> with the error message and/or indicate that the
> error
> came from libxml2. (I found that out from googling).
>
> > > Any ideas on the best way to detect and ignore
> > > multi-byte content?
> >
> > Libxml2 is suppose to detect the encoding and
> > convert to UTF-8 internally.
>
> Ah. Good to know.
>
> Best,
> jrobinson
>
> __________________________________
> Do you Yahoo!?
> Exclusive Video Premiere - Britney Spears
> http://launch.yahoo.com/promos/britneyspears/
__________________________________
Do you Yahoo!?
Exclusive Video Premiere - Britney Spears
http://launch.yahoo.com/promos/britneyspears/
Received on Mon Oct 27 03:27:25 2003