dsorian@linux:~/swish-e-2.4.2> head -1 test.xml
<?xml version="1.0" encoding="UTF-8"?>
dsorian@linux:~/swish-e-2.4.2>swish-e -c swish-e.conf -i test.html
test.xml -T indexed_words
Adding:[1:descripcion(18)] 'blah' Pos:20 Stuct:0x1 ( FILE )
Adding:[1:idioma(10)] 'diseno' Pos:25 Stuct:0x1 ( FILE )
Adding:[1:curso(12)] 'diseno' Pos:25 Stuct:0x1 ( FILE )
Adding:[1:asignatura(14)] 'diseno' Pos:25 Stuct:0x1 ( FILE )
Adding:[1:asignatura.nombre(15)] 'diseno' Pos:25 Stuct:0x1( FILE )
. |
. diseņo
.
Adding:[1:tipo(17)] 'optativa' Pos:33 Stuct:0x1 ( FILE )
Adding:[1:idioma(10)] 'disenar' Pos:36 Stuct:0x1 ( FILE )
Adding:[1:curso(12)] 'disenar' Pos:36 Stuct:0x1 ( FILE )
Adding:[1:asignatura(14)] 'disenar' Pos:36 Stuct:0x1
( FILE )
Adding:[1:descripcion(18)] 'disenar' Pos:36 Stuct:0x1 ( FILE )
|
diseņar
Ok, I have indexed all words but without non-English chars. But,
why you get indexed diseņar and I index disenar? My libxml2 is
libxml2-2.6.11-1
If it's impossible to index ņ for the UTF translation of the
libxml2 I'll translate the queries of the users. If they want to
search, for example, 'asignatura=diseņo' I'll translate it to
'asignatura=diseno' , I think is the only way for me to make it
work.
Thank you.
> I think that's why you were not able to search for some words --
they
> were never indexed by swish. In my case, libxml2 thinks it's
indexing
> UTF-8 -- but ņ is not a valid UTF-8 character so it stops.
>
> But if I do this:
>
> moseley@bumby:~$ head -1 test.xml
> <?xml version="1.0" encoding="iso-8859-1" ?>
>
> moseley@bumby:~$ swish-e -c c -i test.xml -T indexed_words -v0 -T
indexed_words
> Adding:[1:swishdefault(1)] 'troncal' Pos:6 Stuct:0x1
( FILE )
> Adding:[1:swishdefault(1)] 'blah' Pos:9 Stuct:0x1
( FILE )
> Adding:[1:swishdefault(1)] 'optativa' Pos:14 Stuct:0x1
( FILE )
> Adding:[1:swishdefault(1)] 'diseņar' Pos:17 Stuct:0x1
( FILE )
> Adding:[1:swishdefault(1)] 'obligatoria' Pos:24 Stuct:0x1
( FILE )
>
>
> Now, for some reason you are not having that problem:
>
> > Adding:[2:idioma(10)] 'diseno' Pos:25 Stuct:0x1
( FILE )
> > Adding:[2:curso(12)] 'diseno' Pos:25 Stuct:0x1 ( FILE )
> > Adding:[2:asignatura(14)] 'diseno' Pos:25 Stuct:0x1
>
> So are you saying that you cannot search for those words?
>
> > I want to know if the non-English chars can be indexed
correctly in
> > the XML files.
>
> Libxml2 converts to UTF-8 internally then swish converts to 8859-1
> before indexing. If your "non-English" characters can pass that
> conversion, then yes.
>
Received on Sat Nov 6 04:37:06 2004