Skip to main content.
home | support | download

Back to List Archive

Indexing xml files that has another included xml file (using

From: Edgard Pineda <epineda(at)not-real.newtenberg.com>
Date: Wed Sep 08 2004 - 22:18:26 GMT
Hello All!
	I'm trying to indexing this example of a xml file: (named
"example.base")

<?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE article [
    <!ENTITY xmlfrag SYSTEM "other.data" >
]>
<article>
  &xmlfrag;
</article>

In the same directory I have "other.data" file wich has a lot of xml
content.
I did created the following config file "swishe_conf":

# Indexable data
IndexDir /test/swish-e_test
IndexFile /test/swish-e_test/index
IndexOnly .base
 
# Name and description
IndexName "Testing Index"
IndexDescription "Generated by Swish-e 2.4.0"
 
# XML
MetaNames hl1 hl2
PropertyNames hl1 hl2
IndexReport 3
FollowSymLinks yes
 
# Ranking
IgnoreTotalWordCountWhenRanking yes
IndexComments 0
 
# Spanish characters
TranslateCharacters áéíóúüñÁÉÍÓÚÜÑ aeiouunAEIOUUN
 
# Stopwords
#MinWordLimit 3
MaxWordLimit 30


Then I run:
> swish-e -v 3 -c swishe_conf -f index.tmp
Parsing config file 'swishe_conf'
Indexing Data Source: "File-System"
Indexing "/home/proy/devel/test/swish-e_test"
 
Checking dir "/home/proy/devel/test/swish-e_test"...
  23640.base - Using DEFAULT (HTML2) parser -  (1 words)
 
Removing very common words...
  Getting IgnoreLimit stopwords: Complete
no words removed.
Writing main index...
Sorting words ...
Sorting 1 words alphabetically
Writing header ...
Writing index entries ...
  Writing word text: Complete
  Writing word hash: Complete
  Writing word data: Complete
1 unique word indexed.
6 properties sorted.
1 file indexed.  143 total bytes.  1 total words.
Elapsed time: 00:00:00 CPU time: 00:00:00
Indexing done!

I want that swish-e includes de xml file other.data in the file indexed
and shows me several keywords... but then I run:

> swish-e -f index.tmp -k '*'
# SWISH format: 2.4.0
index.tmp: xmlfrag

:(

What should I do to make that swish-e can include the file specified in 
'ENTITY xxx SYSTEM "somefile"' in indexed xml files??

Thanks in advance for your help!!

Edgard Pineda.
Received on Wed Sep 8 15:19:58 2004