hi,
in order to get nyas to work in our multilanguage search, i had to use
<?xml version="1.0" encoding="ISO-8859-1"?>
i notice that you are using
<?xml version="1.0" encoding="UTF-8"?>
have you tried ISO-8859-1?
good luck! i fought with this for a long time.
Brad
------------------------------------------------------------
Brad Miele
Technology Director
IPNStock
(866) 476-7862 x902
bmiele@ipnstock.com
You can make it illegal, but you can't make it unpopular.
On Sun, 7 Nov 2004 dasoso@alumni.uv.es wrote:
>
> Hi all.
>
> Ok Bill, I commented out Wordcharacters.
>
> dsorian@linux:~/swish-e-2.4.2> cat swish-e.conf
>
> IndexDir /usr/local/jakarta-tomcat-4.1.18-LE-jdk14/webapps/cocoon/webs/borrame/kk
> (test.html and test.xml are the only files in the dir)
>
> UndefinedXMLAttributes auto
> UndefinedMetaTags auto
>
> IndexOnly .xml .html .htm
>
> IndexReport 3
> ParserWarnLevel 9
>
> IndexContents XML* .xml
> IndexContents HTML2 .html .htm
>
> TranslateCharacters :ascii7:
> #WordCharacters 0123456789abcdefghijklmnñopqrstuvwxyzáéíóúàèòÇ
>
>
>
> dsorian@linux:~/swish-e-2.4.2> swish-e -c swish-e.conf -T
> indexed_words
>
>
>
> Adding:[1:descripcion(18)] 'blah' Pos:20 Stuct:0x1 ( FILE )
> Adding:[1:idioma(10)] 'diseno' Pos:25 Stuct:0x1 ( FILE )
> Adding:[1:curso(12)] 'diseno' Pos:25 Stuct:0x1 ( FILE )
> Adding:[1:asignatura(14)] 'diseno' Pos:25 Stuct:0x1
> ( FILE )
> Adding:[1:asignatura.nombre(15)] 'diseno' Pos:25 Stuct:0x1
> ( FILE )
> Adding:[1:idioma(10)] 'bases' Pos:26 Stuct:0x1 ( FILE )
>
>
>
>
> test.html - Using HTML2 parser - Adding:[2:swishdefault(1)]
> 'disea' Pos:2 Stuct:0x9 ( BODY FILE )
> Adding:[2:swishdefault(1)] 'o' Pos:3 Stuct:0x9 ( BODY
> FILE )
> Adding:[2:swishdefault(1)] 'disea' Pos:4 Stuct:0x9 ( BODY
> FILE )
> Adding:[2:swishdefault(1)] 'ar' Pos:5 Stuct:0x9 ( BODY
> FILE )
> Adding:[2:swishdefault(1)] 'sea' Pos:6 Stuct:0x9 ( BODY
> FILE )
> Adding:[2:swishdefault(1)] 'ales' Pos:7 Stuct:0x9 ( BODY
> FILE )
> Adding:[2:swishdefault(1)] 'escoa' Pos:8 Stuct:0x9 ( BODY
> FILE )
> Adding:[2:swishdefault(1)] 'ado' Pos:9 Stuct:0x9 ( BODY
> FILE )
> Adding:[2:swishdefault(1)] 'matraz' Pos:10 Stuct:0x9 ( BODY
> FILE )
> Adding:[2:swishdefault(1)] 'nia' Pos:11 Stuct:0x9 ( BODY
> FILE )
> Adding:[2:swishdefault(1)] 'o' Pos:12 Stuct:0x9 ( BODY
> FILE )
> (11 words)
>
>
> dsorian@linux:~/swish-e-2.4.2> swish-e -c swish-e.conf -T
> parsed_words
>
>
>
> test.xml - Using XML2 parser
>
>
>
> White-space found word 'Blah.'
> White-space found word 'Dise?' <--the white blanks appear like a
> square char
> White-space found word 'de'
> White-space found word 'bases'
> White-space found word 'de'
> White-space found word 'datos'
> White-space found word '4'
> White-space found word 'Optativa'
> White-space found word 'Dise?r.' <--- here too
> White-space found word 'segundo'
> White-space found word 'Base'
> White-space found word 'de'
> White-space found word 'datos'
> White-space found word '2'
> White-space found word 'Troncal'
> (17 words)
>
> test.html - Using HTML2 parser - White-space found word 'diseño'
> White-space found word 'diseñar'
> White-space found word 'señales'
> White-space found word 'Escoñado'
> White-space found word 'matraz'
> White-space found word 'niño'
> (11 words)
>
>
>
> So the search for diseño in test.html works perfectly thanks to
> HTML2.
>
> dsorian@linux:~/swish-e-2.4.2> swish-e -w diseño
> # SWISH format: 2.4.2
> # Search words: diseño
> # Removed stopwords:
> # Number of hits: 1
> # Search time: 0.001 seconds
> # Run time: 0.024 seconds
> 1000 /usr/local/jakarta-tomcat-4.1.18-LE-jdk14/webapps/cocoon/webs/borrame/kk/test.html
> "test.html" 78
>
>
>
> dsorian@linux:~/swish-e-2.4.2> swish-e -w 'asignatura.nombre=diseño'
> # SWISH format: 2.4.2
> # Search words: asignatura.nombre=diseño
> # Removed stopwords:
> err: no results
>
>
> dsorian@linux:~/swish-e-2.4.2> swish-e -w 'asignatura.nombre=diseno'
> # SWISH format: 2.4.2
> # Search words: asignatura.nombre=diseno
> # Removed stopwords:
> # Number of hits: 1
> # Search time: 0.001 seconds
> # Run time: 0.023 seconds
> 1000 /usr/local/jakarta-tomcat-4.1.18-LE-jdk14/webapps/cocoon/webs/borrame/kk/test.xml
> "test.xml" 671
>
>
>
> It seems, I will not have problems with the search in .html files.
>
> linux:/usr/... # head -1 test.xml
> <?xml version="1.0" encoding="UTF-8"?>
>
> You said that the search for diseño and diseno should match, but it
> doen't.Why?
>
>
>
> Thank you.
>
> David Soriano.
>
>
>
>
>
>
Received on Sun Nov 7 19:24:34 2004