On Wed, 29 Jan 2003, David Cogley wrote:
> On Wed, 29 Jan 2003, Bill Moseley wrote:
>
> > What's your LANG environment variable set to?
>
> en_US.UTF-8
Try:
export LANG=en_US
See http://swish-e.org/archive/4870.html
or better remove that tr/// line.
I've been meaning to ask on a perl list why there's that error. There's
some discusson of it on perlmonks.
I wish I understood what's happening. My guess has been that the text is
in UTF-8 and the tr/// is operating on bytes instead of chars and that
breaks the char encoding. That's hard to belive, though. I just have not
spent the time to grok in all the encoding stuff in Perl yet.
I'm not clear what that LANG setting is doing, and what's happening with
pdftotext. I assume that the pdftotext program is outputting UTF-8. And
perl also assumes that text is in UTF-8 from the LANG setting. I'm not
sure if libxml2 would detect that it's UTF-8.
Can anyone explain all this encoding stuff in a few paragraphs? ;)
--
Bill Moseley moseley@hank.org
Received on Thu Jan 30 02:26:01 2003