Skip to main content.
home | support | download

Back to List Archive

Indexing xml files that has another included xml file (using

From: Edgard Pineda <epineda(at)>
Date: Wed Sep 08 2004 - 22:18:26 GMT
Hello All!
	I'm trying to indexing this example of a xml file: (named

<?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE article [
    <!ENTITY xmlfrag SYSTEM "" >

In the same directory I have "" file wich has a lot of xml
I did created the following config file "swishe_conf":

# Indexable data
IndexDir /test/swish-e_test
IndexFile /test/swish-e_test/index
IndexOnly .base
# Name and description
IndexName "Testing Index"
IndexDescription "Generated by Swish-e 2.4.0"
MetaNames hl1 hl2
PropertyNames hl1 hl2
IndexReport 3
FollowSymLinks yes
# Ranking
IgnoreTotalWordCountWhenRanking yes
IndexComments 0
# Spanish characters
TranslateCharacters áéíóúüñÁÉÍÓÚÜÑ aeiouunAEIOUUN
# Stopwords
#MinWordLimit 3
MaxWordLimit 30

Then I run:
> swish-e -v 3 -c swishe_conf -f index.tmp
Parsing config file 'swishe_conf'
Indexing Data Source: "File-System"
Indexing "/home/proy/devel/test/swish-e_test"
Checking dir "/home/proy/devel/test/swish-e_test"...
  23640.base - Using DEFAULT (HTML2) parser -  (1 words)
Removing very common words...
  Getting IgnoreLimit stopwords: Complete
no words removed.
Writing main index...
Sorting words ...
Sorting 1 words alphabetically
Writing header ...
Writing index entries ...
  Writing word text: Complete
  Writing word hash: Complete
  Writing word data: Complete
1 unique word indexed.
6 properties sorted.
1 file indexed.  143 total bytes.  1 total words.
Elapsed time: 00:00:00 CPU time: 00:00:00
Indexing done!

I want that swish-e includes de xml file in the file indexed
and shows me several keywords... but then I run:

> swish-e -f index.tmp -k '*'
# SWISH format: 2.4.0
index.tmp: xmlfrag


What should I do to make that swish-e can include the file specified in 
'ENTITY xxx SYSTEM "somefile"' in indexed xml files??

Thanks in advance for your help!!

Edgard Pineda.
Received on Wed Sep 8 15:19:58 2004