Skip to main content.
home | support | download

Back to List Archive

Re: Fw: Re: 8-bit chars

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Fri Dec 12 2003 - 15:58:58 GMT
On Fri, Dec 12, 2003 at 07:43:18AM -0800, John Angel wrote:
> > Think of your suggestion.  One document is 1250 and it includes a word
> > with the "d"-slash character.  That word gets indexed -- since the index
> > stores numbers (not characters) that stored word includes the F0 byte.
> > The next document is in 8859-1 and it includes some word with the "eth"
> > character (it's an Icelandic document, I suppose) and that gets indexed,
> > and again there's a word that includes byte F0 in the index.
> >
> > Now you have a value in the index "F0" that represents more than one
> > character.  So when searching are you looking for a 1250 char or 8859-1
> > char?  You can't tell.
> 
> It doesn't matter, as long as you find that character.
> 
> Why it doesn't matter? Because I will put charset directly in HTML. Search
> script just has to find F0 always, it is not important what character is
> that.

Perhaps someone else can explain it better than I can.


-- 
Bill Moseley
moseley@hank.org
Received on Fri Dec 12 15:59:03 2003