> On Mon, Nov 01, 2004 at 03:40:06AM -0800, dasoso@alumni.uv.es
wrote:
> > My .conf file looks like this:
> >
> > UndefinedXMLAttributes auto
> > UndefinedMetaTags auto
>
> Sure you want to do that? Seems like you will be creating a lot
of metanames.
For the moment I want to index all the metanames.
>
> To find out why you are getting no resuts first use:
>
> swish-e -c config -i test.html test.xml -T indexed_words
>
> and you will notice something odd. Indexing stops in the middle
of
> the XML file.
>
> Then to find out why the parser stopped processing the file turn
on:
I tried it but I don't see anything odd here's what I get. It seems
that every word are indexed. The only problem appears with diseņo
that is indexed as diseno:
I have the ParserWarmLevel 9 in the config file
This is the test.html file:
<html>
<body>
diseņo
seņales
niņo
perro
leņa
piņa
</body>
</html>
and the test.xml:
<?xml version="1.0" standalone="no" ?>
<!DOCTYPE order SYSTEM "pedido.dtd">
<Idioma tipo="Castellano">
<curso numero="quinto">
<asignatura nombre="IPI" codigo="1">
<tipo> Troncal</tipo>
<descripcion> Blah.</descripcion>
</asignatura>
<asignatura nombre="Diseņo de bases de datos" codigo="4">
<tipo> Optativa</tipo>
<descripcion> Diseņar.</descripcion>
</asignatura>
</curso>
<curso numero="segundo">
<asignatura nombre="Base de datos" codigo="2">
<tipo> Obligatoria </tipo>
<descripcion> </descripcion>
</asignatura>
</curso>
</Idioma>
dsorian@linux:~/swish-e-2.4.2> swish-e -c swish-e.conf -i test.html
test.xml -T indexed_words
Indexing Data Source: "File-System"
Indexing "test.html"
Checking file "test.html"...
test.html - Using HTML2 parser - Adding:[1:swishdefault(1)]
'disea' Po
s:2 Stuct:0x9 ( BODY FILE )
Adding:[1:swishdefault(1)] 'o' Pos:3 Stuct:0x9 ( BODY
FILE )
Adding:[1:swishdefault(1)] 'sea' Pos:4 Stuct:0x9 ( BODY
FILE )
Adding:[1:swishdefault(1)] 'ales' Pos:5 Stuct:0x9 ( BODY
FILE )
Adding:[1:swishdefault(1)] 'nia' Pos:6 Stuct:0x9 ( BODY
FILE )
Adding:[1:swishdefault(1)] 'o' Pos:7 Stuct:0x9 ( BODY
FILE )
Adding:[1:swishdefault(1)] 'perro' Pos:8 Stuct:0x9 ( BODY
FILE )
Adding:[1:swishdefault(1)] 'lea' Pos:9 Stuct:0x9 ( BODY
FILE )
Adding:[1:swishdefault(1)] 'pia' Pos:10 Stuct:0x9 ( BODY
FILE )
(9 words)
Indexing "test.xml"
Checking file "test.xml"...
test.xml - Using XML2 parser - **Adding automatic MetaName
'idioma' found in f
ile 'test.xml'
**Adding automatic MetaName 'idioma.tipo' found in file 'test.xml'
Adding:[2:idioma(10)] 'castellano' Pos:3 Stuct:0x1 ( FILE )
Adding:[2:idioma.tipo(11)] 'castellano' Pos:3 Stuct:0x1
( FILE )
**Adding automatic MetaName 'curso' found in file 'test.xml'
**Adding automatic MetaName 'curso.numero' found in file 'test.xml'
Adding:[2:idioma(10)] 'quinto' Pos:7 Stuct:0x1 ( FILE )
Adding:[2:curso(12)] 'quinto' Pos:7 Stuct:0x1 ( FILE )
Adding:[2:curso.numero(13)] 'quinto' Pos:7 Stuct:0x1
( FILE )
**Adding automatic MetaName 'asignatura' found in file 'test.xml'
**Adding automatic MetaName 'asignatura.nombre' found in file
'test.xml'
Adding:[2:idioma(10)] 'ipi' Pos:11 Stuct:0x1 ( FILE )
Adding:[2:curso(12)] 'ipi' Pos:11 Stuct:0x1 ( FILE )
Adding:[2:asignatura(14)] 'ipi' Pos:11 Stuct:0x1 ( FILE )
Adding:[2:asignatura.nombre(15)] 'ipi' Pos:11 Stuct:0x1
( FILE )
**Adding automatic MetaName 'asignatura.codigo' found in file
'test.xml'
Adding:[2:idioma(10)] '1' Pos:14 Stuct:0x1 ( FILE )
Adding:[2:curso(12)] '1' Pos:14 Stuct:0x1 ( FILE )
Adding:[2:asignatura(14)] '1' Pos:14 Stuct:0x1 ( FILE )
Adding:[2:asignatura.codigo(16)] '1' Pos:14 Stuct:0x1
( FILE )
**Adding automatic MetaName 'tipo' found in file 'test.xml'
Adding:[2:idioma(10)] 'troncal' Pos:17 Stuct:0x1 ( FILE )
Adding:[2:curso(12)] 'troncal' Pos:17 Stuct:0x1 ( FILE )
Adding:[2:asignatura(14)] 'troncal' Pos:17 Stuct:0x1
( FILE )
Adding:[2:tipo(17)] 'troncal' Pos:17 Stuct:0x1 ( FILE )
**Adding automatic MetaName 'descripcion' found in file 'test.xml'
Adding:[2:idioma(10)] 'blah' Pos:20 Stuct:0x1 ( FILE )
Adding:[2:curso(12)] 'blah' Pos:20 Stuct:0x1 ( FILE )
Adding:[2:asignatura(14)] 'blah' Pos:20 Stuct:0x1 ( FILE )
Adding:[2:descripcion(18)] 'blah' Pos:20 Stuct:0x1 ( FILE )
Adding:[2:idioma(10)] 'diseno' Pos:25 Stuct:0x1 ( FILE )
Adding:[2:curso(12)] 'diseno' Pos:25 Stuct:0x1 ( FILE )
Adding:[2:asignatura(14)] 'diseno' Pos:25 Stuct:0x1
( FILE )
Adding:[2:asignatura.nombre(15)] 'diseno' Pos:25 Stuct:0x1
( FILE )
Adding:[2:idioma(10)] 'bases' Pos:26 Stuct:0x1 ( FILE )
Adding:[2:curso(12)] 'bases' Pos:26 Stuct:0x1 ( FILE )
Adding:[2:asignatura(14)] 'bases' Pos:26 Stuct:0x1 ( FILE )
Adding:[2:asignatura.nombre(15)] 'bases' Pos:26 Stuct:0x1
( FILE )
Adding:[2:idioma(10)] 'datos' Pos:27 Stuct:0x1 ( FILE )
Adding:[2:curso(12)] 'datos' Pos:27 Stuct:0x1 ( FILE )
Adding:[2:asignatura(14)] 'datos' Pos:27 Stuct:0x1 ( FILE )
Adding:[2:asignatura.nombre(15)] 'datos' Pos:27 Stuct:0x1
( FILE )
Adding:[2:idioma(10)] '4' Pos:30 Stuct:0x1 ( FILE )
Adding:[2:curso(12)] '4' Pos:30 Stuct:0x1 ( FILE )
Adding:[2:asignatura(14)] '4' Pos:30 Stuct:0x1 ( FILE )
Adding:[2:asignatura.codigo(16)] '4' Pos:30 Stuct:0x1
( FILE )
Adding:[2:idioma(10)] 'optativa' Pos:33 Stuct:0x1 ( FILE )
Adding:[2:curso(12)] 'optativa' Pos:33 Stuct:0x1 ( FILE )
Adding:[2:asignatura(14)] 'optativa' Pos:33 Stuct:0x1
( FILE )
Adding:[2:tipo(17)] 'optativa' Pos:33 Stuct:0x1 ( FILE )
Adding:[2:idioma(10)] 'disenar' Pos:36 Stuct:0x1 ( FILE )
Adding:[2:curso(12)] 'disenar' Pos:36 Stuct:0x1 ( FILE )
Adding:[2:asignatura(14)] 'disenar' Pos:36 Stuct:0x1
( FILE )
Adding:[2:descripcion(18)] 'disenar' Pos:36 Stuct:0x1
( FILE )
Adding:[2:idioma(10)] 'segundo' Pos:42 Stuct:0x1 ( FILE )
Adding:[2:curso(12)] 'segundo' Pos:42 Stuct:0x1 ( FILE )
Adding:[2:curso.numero(13)] 'segundo' Pos:42 Stuct:0x1
( FILE )
Adding:[2:idioma(10)] 'base' Pos:46 Stuct:0x1 ( FILE )
Adding:[2:curso(12)] 'base' Pos:46 Stuct:0x1 ( FILE )
Adding:[2:asignatura(14)] 'base' Pos:46 Stuct:0x1 ( FILE )
Adding:[2:asignatura.nombre(15)] 'base' Pos:46 Stuct:0x1
( FILE )
Adding:[2:idioma(10)] 'datos' Pos:47 Stuct:0x1 ( FILE )
Adding:[2:curso(12)] 'datos' Pos:47 Stuct:0x1 ( FILE )
Adding:[2:asignatura(14)] 'datos' Pos:47 Stuct:0x1 ( FILE )
Adding:[2:asignatura.nombre(15)] 'datos' Pos:47 Stuct:0x1
( FILE )
Adding:[2:idioma(10)] '2' Pos:50 Stuct:0x1 ( FILE )
Adding:[2:curso(12)] '2' Pos:50 Stuct:0x1 ( FILE )
Adding:[2:asignatura(14)] '2' Pos:50 Stuct:0x1 ( FILE )
Adding:[2:asignatura.codigo(16)] '2' Pos:50 Stuct:0x1
( FILE )
Adding:[2:idioma(10)] 'obligatoria' Pos:53 Stuct:0x1
( FILE )
Adding:[2:curso(12)] 'obligatoria' Pos:53 Stuct:0x1
( FILE )
Adding:[2:asignatura(14)] 'obligatoria' Pos:53 Stuct:0x1
( FILE )
Adding:[2:tipo(17)] 'obligatoria' Pos:53 Stuct:0x1 ( FILE )
(17 words)
Removing very common words...
no words removed.
Writing main index...
Sorting words ...
Sorting 24 words alphabetically
Writing header ...
Writing index entries ...
Writing word text: Complete
Writing word hash: Complete
Writing word data: Complete
24 unique words indexed.
4 properties sorted.
2 files indexed. 745 total bytes. 73 total words.
Elapsed time: 00:00:00 CPU time: 00:00:00
Indexing done!
dsorian@linux:~/swish-e-2.4.2>
I want to know if the non-English chars can be indexed correctly in
the XML files.
Thank you.
Received on Thu Nov 4 07:15:57 2004