Skip to main content.
home | support | download

Back to List Archive

Re: Fw: Re: 8-bit chars

From: John Angel <angel_john(at)>
Date: Fri Dec 12 2003 - 15:43:33 GMT
> Think of your suggestion.  One document is 1250 and it includes a word
> with the "d"-slash character.  That word gets indexed -- since the index
> stores numbers (not characters) that stored word includes the F0 byte.
> The next document is in 8859-1 and it includes some word with the "eth"
> character (it's an Icelandic document, I suppose) and that gets indexed,
> and again there's a word that includes byte F0 in the index.
> Now you have a value in the index "F0" that represents more than one
> character.  So when searching are you looking for a 1250 char or 8859-1
> char?  You can't tell.

It doesn't matter, as long as you find that character.

Why it doesn't matter? Because I will put charset directly in HTML. Search
script just has to find F0 always, it is not important what character is
Received on Fri Dec 12 15:43:57 2003