Skip to main content.
home | support | download

Back to List Archive

Re: Error '1' converting internal UTF-8 to Latin-1

From: Michael Peters <mpeters(at)not-real.plusthree.com>
Date: Tue Nov 30 2004 - 14:48:53 GMT
Roman Chyla wrote:
> hi,
> this was post from Bernhard Weishum from 16.11.04

Thank you very much... I did try and search the archives but did not 
find this originally. I had dropped out of the list for a bit and I 
guess that's when I missed it. It's exactly what I was thinking. So any 
idea when it will be incorporated... I hate to have a patched local 
version if there is an official release that will fix it coming on the 
horizon.

> Hi *,
> 
> if you are using fedora core 2 or 3 with automatic updates installed,
> be prepared for lots of warnings during indexing with swish-e.
> The evil geniuses at xmlsoft.org changed all conversion functions to no
> longer return 0 on success (as before and still ducumented), but to
> return the number of bytes processed instead. This also affects
> UTF8Toisolat1(), which swish-e uses for its internal conversion.
> As a result, swish-e (I checked only 2.4.2) spews lots of warnings about
> failed conversions although they went fine.
> 
> The rationale for the API change is described here:
>     http://bugzilla.gnome.org/show_bug.cgi?id=153937
> 
> Extremely trivial (and therefore yet untested) patch against cvs below.
> 
> regards,
> bkw
> 
> 
> Index: src/parser.c
> ===================================================================
> RCS file: /cvsroot/swishe/swish-e/src/parser.c,v
> retrieving revision 1.42.2.2
> diff -u -r1.42.2.2 parser.c
> --- src/parser.c	23 Sep 2002 13:24:37 -0000	1.42.2.2
> +++ src/parser.c	16 Nov 2004 19:16:37 -0000
> @@ -866,7 +866,7 @@
>           if ( used > 0 )         // tally up total bytes consumed
>               buf->cur += used;
> 
> -        if ( ret == 0 )         // all done
> +        if ( ret >= 0 )         // all done
>               return;
> 
>           if ( ret == -2 )        // encoding failed
> 
> 
> 
> 
> 
> Michael Peters wrote:
> 
>>hello all,
>>
>>I've been using swish-e for a while now and have not seen this come up 
>>before. I'm trying to run existing code on a new setup and it works 
>>without any problems, but I keep getting this warning on indexing...
>>
>>     Error '%d' converting internal UTF-8 to Latin-1
>>
>>It always seems to happen on the '<' character, but not every '<' 
>>character, just most. I know swish-e is using libxml2 to parse the 
>>templates and it can't use UFT8 so it has libxml2 convert it to Latin1. 
>>The warning message comes from parser.c line 899 because the return 
>>value of the libxml2 function UTF8Toisolat1() is not returning a '0' 
>>value but some other positive value (usually '1').
>>
>>Now according to this
>>
>>http://xmlsoft.org/html/libxml-encoding.html#UTF8Toisolat1
>>
>>it seems that if UTF8Toisolat1() returns a positive value it means that 
>>it's still valid and refers to the number of octets consumed. Should 
>>this emit a warning in this case?
>>
>>The problem is that my test suite passes and everything works, but the 
>>test suite output is garbled by the output of the indexing script that 
>>gets run before a certain group of tests.
>>
>>If I change parser.c 871 from
>>     if ( ret == 0 )         // all done
>>to
>>     if ( ret >= 0 )         // all done
>>
>>then everything works just fine. Any suggestions? Does this break 
>>anything else? Also, the old setup was using libxml2 version 2.6.8 and 
>>the new setup has 2.6.16 if that helps.
>>
>>Thanks
> 
> 

-- 
Michael Peters
Developer
Plus Three, LP
Received on Tue Nov 30 06:48:54 2004