Re: non ISO-8859-1 headers

From: Tim Freedom <tim_freedom(at)>
Date: Sat Feb 21 2004 - 07:20:34 GMT
--- David L Norris <> wrote:
> > Tim Freedom supposedly wrote on 2/20/04 1:51 PM:
> > > I have lots of files that have both English and Arabic in
> > > them (UTF-8), currently I can only index the english parts (again,
> > > I'm willing to help with adding UTF-8 abilities :-)
> Patches are welcome.  :-)

Give me some direction and an outline of things that need to be done.
In other words, someone intimately involved with the code would know
which parts need to be modified (and grouped appropriately) to make
a noticeable progression forward.  In other words, such inclusion
would have to happen in orchestrated bunches as it has very deep
implications and so some direction and some hand-holding would be

> > yet when I display the output it would be nice to default to UTF-8 to
> > see both texts.
> You mean for the stored description?  That may or may not work depending
> on how you have SWISH-E configured.  I'd suggest testing it to make sure
> multibyte characters aren't destroyed.

Yeah, my stored UTF-8 is not destroyed and I wanted to display it
correctly if/when its listed as part of english search results.

For those that are baffled about this seemingly simple thing, assume
I have the following text which I index (caps are UTF-8 characters),

  "hello there how are you today WELL I HOPE,
   we should talk NEXT TIME YOU ARE IN TOWN about that issue"

Since swish-e only indexes (in my setup) ASCII (UTF-8's are effectively
skipped), when I search for 'talk' the above paragraph would be shown
with 'talk' highlighted (I have it set that way) and so I wanted to
also be able to have the right encoding to see the "NEXT TIME..."
(ie. UTF-8) strings, that's all.

Of course, I would love to index everything (including that UTF-8),
but that's a different topic and a different thread :-)



Received on Fri Feb 20 23:20:34 2004