Skip to main content.
home | support | download

Back to List Archive

libxml2 and non-ascii?

From: Roman Chyla <chyla(at)not-real.knihovnabbb.cz>
Date: Fri Nov 19 2004 - 14:43:10 GMT
Hi,

I have noticed, that when I use libxml2 on my indexed files, special 
characters are stripped off (in my case czech characters)

Switching to DefaultContents HTML solved that problem - (together with 
TranslateCharacters directive)


I tried it with these configurations : swish-e 2.5.2 and swish-e 2.2 on 
Linux; v2.4.2 and v2.2 on Windows - both OS's behaved in the same way, 
so I expect it is not in the configuration of the computers. (Am I 
wrong?    [on Linux I have LANG=cs_CZ;LANGUAGE=czech])

best regards

roman

below is some output from -T INDEXED_WORDS

#my config
TranslateCharacters ľ®ą©»«žŽšŠťŤ zzssttzzsstt
PropCompressionLevel 6
DefaultContents HTML


####debug output from windows v2.4.2 with HTML2
Indexing Data Source: "External-Program"
Indexing "perl.exe"
External Program found: C:\PERL\BIN\/perl.exe
     Adding:[1:id(10)]   'rego'   Pos:1  Stuct:0x1 ( FILE )
     Adding:[1:id(10)]   'rapid'   Pos:2  Stuct:0x1 ( FILE )
     Adding:[1:id(10)]   '025669'   Pos:3  Stuct:0x1 ( FILE )
     Adding:[1:id(10)]   '025669'   Pos:2  Stuct:0x85 ( META HEAD FILE )
     Adding:[1:idd(16)]   '20040226'   Pos:5  Stuct:0x85 ( META HEAD FILE )
     Adding:[1:ti(11)]   'buchlovské'   Pos:8  Stuct:0x85 ( META HEAD FILE )
     Adding:[1:ti(11)]   'nám'   Pos:9  Stuct:0x85 ( META HEAD FILE )
     Adding:[1:ti(11)]   'stí'   Pos:10  Stuct:0x85 ( META HEAD FILE )
     Adding:[1:au(12)]   'mar'   Pos:13  Stuct:0x85 ( META HEAD FILE )
     Adding:[1:au(12)]   'álková'   Pos:14  Stuct:0x85 ( META HEAD FILE )

----this is correct from linux (v2.5.2; DefaultContents HTML2), in 
windows it would be the same
Indexing Data Source: "External-Program"
Indexing "/usr/bin/perl"
External Program found: /usr/bin/perl
     Adding:[1:id(10)]   'rego'   Pos:1  Stuct:0x1 ( FILE )
     Adding:[1:id(10)]   'rapid'   Pos:2  Stuct:0x1 ( FILE )
     Adding:[1:id(10)]   '025669'   Pos:3  Stuct:0x1 ( FILE )
     Adding:[1:id(10)]   '025669'   Pos:2  Stuct:0x85 ( META HEAD FILE )
     Adding:[1:idd(16)]   '20040226'   Pos:5  Stuct:0x85 ( META HEAD FILE )
     Adding:[1:ti(11)]   'buchlovské'   Pos:8  Stuct:0x85 ( META HEAD FILE )
     Adding:[1:ti(11)]   'náměstí'   Pos:9  Stuct:0x85 ( META HEAD FILE )
     Adding:[1:au(12)]   'marsálková'   Pos:12  Stuct:0x85 ( META HEAD 
FILE )
     Adding:[1:au(12)]   'zdenka'   Pos:13  Stuct:0x85 ( META HEAD FILE )
     Adding:[1:au(12)]   'ing'   Pos:14  Stuct:0x85 ( META HEAD FILE )
     Adding:[1:au(12)]   'zdenka'   Pos:15  Stuct:0x85 ( META HEAD FILE )
     Adding:[1:au(12)]   'marsálková'   Pos:16  Stuct:0x85 ( META HEAD 
FILE )
Received on Fri Nov 19 06:43:17 2004