> On Sat, Nov 06, 2004 at 04:35:24AM -0800, dasoso@alumni.uv.es
wrote:
> > Adding:[1:descripcion(18)] 'disenar' Pos:36 Stuct:0x1
( FILE )
> > |
> > diseñar
> >
> >
> > Ok, I have indexed all words but without non-English chars.
But,
> > why you get indexed diseñar and I index disenar?
>
> Because you said in your config file
>
> TranslateCharacters :ascii7:
>
> The point of that is so you can use either diseñar or disenar
> in your query and find the same word.
>
No Bill, it doesn't work.
I tried without the TranslateCharacters :ascii7: in the config file:
UndefinedXMLAttributes auto
UndefinedMetaTags auto
IndexOnly .xml .html .htm
IndexReport 3
ParserWarnLevel 9
IndexContents XML* .xml
IndexContents HTML2 .html .htm
WordCharacters 0123456789abcdefghijklmnñopqrstuvwxyzáéíóúàèòÇ
And tha's what I get:
dsorian@linux:~/swish-e-2.4.2>swish-e -c swish-e.conf -i test.html
test.xml -T indexed_words
Indexing "test.html"
Checking file "test.html"...
test.html - Using HTML2 parser - Adding:[1:swishdefault(1)]
'diseño' Pos:2 Stuct:0x9 ( BODY FILE )
Adding:[1:swishdefault(1)] 'señales' Pos:3 Stuct:0x9 ( BODY
FILE )
Adding:[1:swishdefault(1)] 'niño' Pos:4 Stuct:0x9 ( BODY
FILE )
Adding:[1:swishdefault(1)] 'perro' Pos:5 Stuct:0x9 ( BODY
FILE )
Adding:[1:swishdefault(1)] 'leña' Pos:6 Stuct:0x9 ( BODY
FILE )
Adding:[1:swishdefault(1)] 'piña' Pos:7 Stuct:0x9 ( BODY
FILE )
(6 words)
Indexing "test.xml"
Adding:[2:descripcion(18)] 'blah' Pos:20 Stuct:0x1 ( FILE )
Adding:[2:idioma(10)] 'dise' Pos:25 Stuct:0x1 ( FILE )
Adding:[2:curso(12)] 'dise' Pos:25 Stuct:0x1 ( FILE )
Adding:[2:asignatura(14)] 'dise' Pos:25 Stuct:0x1 ( FILE )
Adding:[2:asignatura.nombre(15)] 'dise' Pos:25 Stuct:0x1
( FILE )
Adding:[2:idioma(10)] 'o' Pos:26 Stuct:0x1 ( FILE )
Adding:[2:curso(12)] 'o' Pos:26 Stuct:0x1 ( FILE )
Adding:[2:asignatura(14)] 'o' Pos:26 Stuct:0x1 ( FILE )
Adding:[2:asignatura.nombre(15)] 'o' Pos:26 Stuct:0x1
( FILE )
Adding:[2:idioma(10)] 'bases' Pos:27 Stuct:0x1 ( FILE )
The search for diseño returns test.html only.
The search for diseño or diseno :
dsorian@linux:~/swish-e-2.4.2> swish-e -w 'asignatura.nombre=diseño'
# SWISH format: 2.4.2
# Search words: asignatura.nombre=diseño
# Removed stopwords:
err: no results
dsorian@linux:~/swish-e-2.4.2> swish-e -w 'asignatura.nombre=diseno'
# SWISH format: 2.4.2
# Search words: asignatura.nombre=diseno
# Removed stopwords:
err: no results
And with the TranslateCharacters :ascii7: directive
dsorian@linux:~/swish-e-2.4.2> swish-e -c swish-e.conf -i test.html
test.xml -T indexed_words
Indexing Data Source: "File-System"
Indexing "test.html"
Checking file "test.html"...
test.html - Using HTML2 parser - Adding:[1:swishdefault(1)]
'disea' Pos:2 Stuct:0x9 ( BODY FILE )
Adding:[1:swishdefault(1)] 'o' Pos:3 Stuct:0x9 ( BODY
FILE )
Adding:[1:swishdefault(1)] 'sea' Pos:4 Stuct:0x9 ( BODY
FILE )
Adding:[1:swishdefault(1)] 'ales' Pos:5 Stuct:0x9 ( BODY
FILE )
Adding:[1:swishdefault(1)] 'nia' Pos:6 Stuct:0x9 ( BODY
FILE )
Adding:[1:swishdefault(1)] 'o' Pos:7 Stuct:0x9 ( BODY
FILE )
Adding:[1:swishdefault(1)] 'perro' Pos:8 Stuct:0x9 ( BODY
FILE )
Adding:[1:swishdefault(1)] 'lea' Pos:9 Stuct:0x9 ( BODY
FILE )
Adding:[1:swishdefault(1)] 'pia' Pos:10 Stuct:0x9 ( BODY
FILE )
(9 words)
Indexing "test.xml"
**Adding automatic MetaName 'descripcion' found in file 'test.xml'
Adding:[2:idioma(10)] 'blah' Pos:20 Stuct:0x1 ( FILE )
Adding:[2:curso(12)] 'blah' Pos:20 Stuct:0x1 ( FILE )
Adding:[2:asignatura(14)] 'blah' Pos:20 Stuct:0x1 ( FILE )
Adding:[2:descripcion(18)] 'blah' Pos:20 Stuct:0x1 ( FILE )
Adding:[2:idioma(10)] 'diseno' Pos:25 Stuct:0x1 ( FILE )
Adding:[2:curso(12)] 'diseno' Pos:25 Stuct:0x1 ( FILE )
Adding:[2:asignatura(14)] 'diseno' Pos:25 Stuct:0x1
( FILE )
Adding:[2:asignatura.nombre(15)] 'diseno' Pos:25 Stuct:0x1
( FILE )
Adding:[2:idioma(10)] 'bases' Pos:26 Stuct:0x1 ( FILE )
Adding:[2:curso(12)] 'bases' Pos:26 Stuct:0x1 ( FILE )
Adding:[2:asignatura(14)] 'bases' Pos:26 Stuct:0x1 ( FILE )
Adding:[2:asignatura.nombre(15)] 'bases' Pos:26 Stuct:0x1
( FILE )
Adding:[2:idioma(10)] 'datos' Pos:27 Stuct:0x1 ( FILE )
Adding:[2:asignatura(14)] 'optativa' Pos:33 Stuct:0x1
( FILE )
Adding:[2:tipo(17)] 'optativa' Pos:33 Stuct:0x1 ( FILE )
Adding:[2:idioma(10)] 'disenar' Pos:36 Stuct:0x1 ( FILE )
Adding:[2:curso(12)] 'disenar' Pos:36 Stuct:0x1 ( FILE )
Adding:[2:asignatura(14)] 'disenar' Pos:36 Stuct:0x1
( FILE )
Adding:[2:descripcion(18)] 'disenar' Pos:36 Stuct:0x1
( FILE )
Adding:[2:idioma(10)] 'segundo' Pos:42 Stuct:0x1 ( FILE )
Adding:[2:curso(12)] 'segundo' Pos:42 Stuct:0x1 ( FILE )
Adding:[2:curso.numero(13)] 'segundo' Pos:42 S
And the search for asignatura=diseñar doesn't works.
dsorian@linux:~/swish-e-2.4.2> swish-e -w 'asignatura=disenar'
# SWISH format: 2.4.2
# Search words: asignatura=disenar
# Removed stopwords:
# Number of hits: 1
# Search time: 0.001 seconds
# Run time: 0.022 seconds
1000 test.xml "test.xml" 683
dsorian@linux:~/swish-e-2.4.2> swish-e -w 'asignatura=diseñar'
# SWISH format: 2.4.2
# Search words: asignatura=diseñar
# Removed stopwords:
err: no results
And the same for asignatura.nombre=diseno
So I can't use either diseñar or disenar in the query to find the
same word.
Thank you.
David.
Received on Sat Nov 6 06:40:30 2004