On 09/13/2007 10:09 AM, Peter Karman wrote:
> I guess then the onus is on Perl to deal with mismatched encodings. It actually
> seems to do that reasonably well in some cases, horribly in others. The biggest
> issue I've seen is when it interprets bytes intended as UTF-8 as Latin1. There
> are cases where the same sequence of bytes is valid in both encodings, and Perl
> seems to assume Latin1 as the default.
>
I should add that as an example, look at my Search::Tools::UTF8 CPAN module.
IIRC, there are some tests in there that can be useful when un-raveling the
Perl UTF-8 maze.
http://search.cpan.org/~karman/Search-Tools-0.10/lib/Search/Tools/UTF8.pm
--
Peter Karman . peter(at)not-real.peknet.com . http://peknet.com/
_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Thu Sep 13 11:39:00 2007