Skip to main content.
home | support | download

Back to List Archive

Fw: Re: 8-bit chars

From: John Angel <angel_john(at)not-real.hotmail.com>
Date: Wed Dec 10 2003 - 21:42:21 GMT
Here it is:

section 1 of 1 of file test.htm  < uuencode 5.32 by R.E.M. >

begin 644 test.htm
M/$A434P^#0H\345402!(5%10+45154E6/2)#;VYT96YT+51Y<&4B($-/3E1%
M3E0](G1E>'0O:'1M;#L@8VAA<G-E=#U7:6YD;W=S+3$R-3`B/@T*#0H\4#Y.
M;VXM96YG;&ES:"!C:&%R<SH@\"P@GBP@YBP@Z"P@FBP@T"P@CBP@QBP@R"P@
#B@T*
`
end
sum -r/size 55860/217 section (from "begin" to "end")
sum -r/size 44476/138 entire input file


----- Original Message ----- 
From: "John Angel" <angel_john@hotmail.com>
To: "Multiple recipients of list" <swish-e@sunsite.berkeley.edu>
Sent: Wednesday, December 10, 2003 22:36
Subject: [SWISH-E] Re: 8-bit chars


> Windows-1250 codepage example which is not working is attached.
>
> Non-english characters are not indexed at all.
>
> WordCharacters doesn't help.
>
>
>
> ----- Original Message ----- 
> From: "Bill Moseley" <moseley@hank.org>
> To: "John Angel" <angel_john@hotmail.com>
> Cc: "Multiple recipients of list" <swish-e@sunsite.berkeley.edu>
> Sent: Wednesday, December 03, 2003 20:21
> Subject: Re: [SWISH-E] 8-bit chars
>
>
> > On Wed, Dec 03, 2003 at 04:22:05AM -0800, John Angel wrote:
> > > I have added chars above ASCII 127 to WordCharacters but it still
> displays
> > > blanks instead of them. Where's the catch?
> >
> > You need to give an example of what's not working.
> >
> > > BTW, I have noticed that in WordCharacters there are only small caps
> chars.
> >
> > Yes, words are lowercased with "tolower()" as you noticed.  So only
> > lower case need to be specified.
> >
> > > UTF-8 support would be great, but I understand it requires major
> rewrite. Is
> > > it possible to have at least full 8-bit chars support instead?
> >
> > It is full 8-bit, but there's a conversion to Latin1 when using libxml2
> > so it may not be 100% 8-bit "clean".  I have not tested that with
> > libxml2.
> >
> > BTW - First thing swish-e does when starting is:
> >
> >       setlocale(LC_CTYPE, "");
> >
> > but that's only in the binary.  (So that might result in problems when
> > people use the Swish-e API on systems with different locales -- that is,
> > tolower() might not change umlauts on indexing but would on searching.q
> >
> > > Searching through previous posts shows that the problem could be in
> > > UTF8Toisolat1() and tolower() functions, but I am not sure how to
change
> and
> > > fix that.
> >
> > Can you provide a specific example of the problem?
> >
> >
> >
> > -- 
> > Bill Moseley
> > moseley@hank.org
> >
> >
>
>
>
> *********************************************************************
> Due to deletion of content types excluded from this list by policy,
> this multipart message was reduced to a single part, and from there
> to a plain text message.
> *********************************************************************
>
Received on Wed Dec 10 21:42:27 2003