Hello All!
I'm trying to indexing this example of a xml file: (named
"example.base")
<?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE article [
<!ENTITY xmlfrag SYSTEM "other.data" >
]>
<article>
&xmlfrag;
</article>
In the same directory I have "other.data" file wich has a lot of xml
content.
I did created the following config file "swishe_conf":
# Indexable data
IndexDir /test/swish-e_test
IndexFile /test/swish-e_test/index
IndexOnly .base
# Name and description
IndexName "Testing Index"
IndexDescription "Generated by Swish-e 2.4.0"
# XML
MetaNames hl1 hl2
PropertyNames hl1 hl2
IndexReport 3
FollowSymLinks yes
# Ranking
IgnoreTotalWordCountWhenRanking yes
IndexComments 0
# Spanish characters
TranslateCharacters áéíóúüñÁÉÍÓÚÜÑ aeiouunAEIOUUN
# Stopwords
#MinWordLimit 3
MaxWordLimit 30
Then I run:
> swish-e -v 3 -c swishe_conf -f index.tmp
Parsing config file 'swishe_conf'
Indexing Data Source: "File-System"
Indexing "/home/proy/devel/test/swish-e_test"
Checking dir "/home/proy/devel/test/swish-e_test"...
23640.base - Using DEFAULT (HTML2) parser - (1 words)
Removing very common words...
Getting IgnoreLimit stopwords: Complete
no words removed.
Writing main index...
Sorting words ...
Sorting 1 words alphabetically
Writing header ...
Writing index entries ...
Writing word text: Complete
Writing word hash: Complete
Writing word data: Complete
1 unique word indexed.
6 properties sorted.
1 file indexed. 143 total bytes. 1 total words.
Elapsed time: 00:00:00 CPU time: 00:00:00
Indexing done!
I want that swish-e includes de xml file other.data in the file indexed
and shows me several keywords... but then I run:
> swish-e -f index.tmp -k '*'
# SWISH format: 2.4.0
index.tmp: xmlfrag
:(
What should I do to make that swish-e can include the file specified in
'ENTITY xxx SYSTEM "somefile"' in indexed xml files??
Thanks in advance for your help!!
Edgard Pineda.
Received on Wed Sep 8 15:19:58 2004