Skip to main content.
home | support | download

Back to List Archive

chars for russian words

From: Alex Guryanow <gav(at)not-real.nlr.ru>
Date: Sat Dec 04 1999 - 14:23:59 GMT
Hi,

I'm from Russia. All russian words consist of chars with byte-code
greather than 127. In files config.h and swish.h they are represented
in lines

config.h:
#define WORDCHARS ...
#define BEGINCHARS ...
#define ENDCHARS ...

swish.h:
char *indexchars = ...

These russian chars are in windows-1251 encoding (for russian language
they are exist many encodings). But not all chars are included in the
above lines and some symbols are invalid. Therefore not all russian
words are indexed and can be searched. I have changed these lines
to include all russian symbols and attach the right files to this
e-mail.

And second. I'm not guru in writing programs and possible the
following problem is the result of my incomtence. The standard
function
        tolower(int c)
works fine only for symbols with byte-code lower than 128, and do not
work for russian symbols. Therefore swish-e evaluates different list
of documents for search words in big and small letters.
I have solved this problem in the fololowing way: I have created my
own function
    tolower_1251(int c)
that correct converts big russian symbols to lower, changed the
Makefile and changed the file swish.h (look at the line 58 in attached
file).
I'w be very glad if anybody tell me how solve this problem correctly.

Best regards,
Alex Guryanow

P.S. With all these changes swish-e works fine for Linux and Sun Solaris.

Received on Sat Dec 4 06:25:01 1999