Skip to main content.
home | support | download

Back to List Archive

Re: non ISO-8859-1 headers

From: David L Norris <dave(at)not-real.webaugur.com>
Date: Fri Feb 20 2004 - 22:57:43 GMT
On Fri, 2004-02-20 at 22:33, Peter Karman wrote:
> Forgive me if I am misunderstanding. This sounds like a thread that went=20
> by in December 2003. Search the discussion archives.

Lots of good info in that thread, if I recall.

> I believe the end result was that because swish-e currently does not save=
 data in UTF-8,=20
> it can't display any of the indexed data in that format. If by "display=20
> the output" you mean the contents of StoreDescription from the swish=20
> index, then that currently can't happen. I don't think the issue is with=20
> the perl locale or encodings settings for the CGI scripts, I think it's=20
> with the data in the index, *as it was indexed*.

HTML2 or XML2 recodes UTF-8 to ISO-8859-1 which will drop characters
that don't map.

HTML or XML may destroy multi-byte characters (since the indexer thinks
each byte is a character).  However, single-byte encodings (ISO-8859-*)
should pass through so long as TranslateCharacters doesn't mangle them.

> Tim Freedom supposedly wrote on 2/20/04 1:51 PM:
> > I have lots of files that have both English and Arabic in
> > them (UTF-8), currently I can only index the english parts (again,
> > I'm willing to help with adding UTF-8 abilities :-)

Patches are welcome.  :-)

> yet when I display the output it would be nice to default to UTF-8 to see=
 both texts.

You mean for the stored description?  That may or may not work depending
on how you have SWISH-E configured.  I'd suggest testing it to make sure
multibyte characters aren't destroyed.

--=20
 David Norris
  http://www.webaugur.com/dave/
  ICQ - 412039



*********************************************************************
Due to deletion of content types excluded from this list by policy,
this multipart message was reduced to a single part, and from there
to a plain text message.
*********************************************************************
Received on Fri Feb 20 14:57:44 2004