On Mon, Nov 22, 2004 at 04:07:13AM -0800, Roman Chyla wrote:
> thank you for the link - I played with configuration, but I am afraid
> the hints from FAQ can't solve my problem in Windows-1250, nor in
> Iso-8859-2 encoding when using libxml2 parser.
>
> I tried also "TranslateCharacters" option, but since the UTF is 16 bit I
> can not map it to 8bit characters (did I miss something?)
UTF-8 is a variable width format, but yes that's correct, iso-8859-1
is an 8 bit character set and of course cannot represent all the chars
like UTF-8 can.
Since swish-e is 8-bit internally it has to convert to an 8-bit
encoding when reading from libxm2. (Libxml2 outputs in UTF-8.)
Since libxml2 provides a function to convert to 8859-1 and encoding is
what most users of swish-e have used in the past that encoding
was used.
> perhaps, there could be a new TranslateCharactersUTF directive for users
> with libxml2 and non-8859-2 characters in docs?
I suppose that would be possible. Currently when there's an encoding
error the character is replaced with a space -- but parser.c could be
hacked to check if the UTF-8 char should be mapped to another UTF-8
character before being encoded in 8859-1.
A rewrite for swish-e to use UTF-8 would be best, of course. That's
not a new idea.
--
Bill Moseley
moseley@hank.org
Unsubscribe from or help with the swish-e list:
http://swish-e.org/Discussion/
Help with Swish-e:
http://swish-e.org/current/docs
swish-e@sunsite.berkeley.edu
Received on Mon Nov 22 06:58:12 2004