Skip to main content.
home | support | download

Back to List Archive

RE: Indexing xml files that has another included xml file (using

From: Sean McGilloway <Kashi(at)not-real.martenlaw.com>
Date: Wed Sep 08 2004 - 22:27:50 GMT
I've tried several times to remove myself from this list without success. Please remove my address from this list serve. Thank you.

	-----Original Message----- 
	From: Edgard Pineda [mailto:epineda@newtenberg.com] 
	Sent: Wed 9/8/2004 3:17 PM 
	To: Multiple recipients of list 
	Cc: 
	Subject: [SWISH-E] Indexing xml files that has another included xml file (using
	
	

	Hello All! 
	        I'm trying to indexing this example of a xml file: (named 
	"example.base") 

	<?xml version="1.0" encoding="iso-8859-1"?> 
	<!DOCTYPE article [ 
	    <!ENTITY xmlfrag SYSTEM "other.data" > 
	]> 
	<article> 
	  &xmlfrag; 
	</article> 

	In the same directory I have "other.data" file wich has a lot of xml 
	content. 
	I did created the following config file "swishe_conf": 

	# Indexable data 
	IndexDir /test/swish-e_test 
	IndexFile /test/swish-e_test/index 
	IndexOnly .base 
	  
	# Name and description 
	IndexName "Testing Index" 
	IndexDescription "Generated by Swish-e 2.4.0" 
	  
	# XML 
	MetaNames hl1 hl2 
	PropertyNames hl1 hl2 
	IndexReport 3 
	FollowSymLinks yes 
	  
	# Ranking 
	IgnoreTotalWordCountWhenRanking yes 
	IndexComments 0 
	  
	# Spanish characters 
	TranslateCharacters áéíóúüñÁÉÍÓÚÜÑ aeiouunAEIOUUN 
	  
	# Stopwords 
	#MinWordLimit 3 
	MaxWordLimit 30 


	Then I run: 
	> swish-e -v 3 -c swishe_conf -f index.tmp 
	Parsing config file 'swishe_conf' 
	Indexing Data Source: "File-System" 
	Indexing "/home/proy/devel/test/swish-e_test" 
	  
	Checking dir "/home/proy/devel/test/swish-e_test"... 
	  23640.base - Using DEFAULT (HTML2) parser -  (1 words) 
	  
	Removing very common words... 
	  Getting IgnoreLimit stopwords: Complete 
	no words removed. 
	Writing main index... 
	Sorting words ... 
	Sorting 1 words alphabetically 
	Writing header ... 
	Writing index entries ... 
	  Writing word text: Complete 
	  Writing word hash: Complete 
	  Writing word data: Complete 
	1 unique word indexed. 
	6 properties sorted. 
	1 file indexed.  143 total bytes.  1 total words. 
	Elapsed time: 00:00:00 CPU time: 00:00:00 
	Indexing done! 

	I want that swish-e includes de xml file other.data in the file indexed 
	and shows me several keywords... but then I run: 

	> swish-e -f index.tmp -k '*' 
	# SWISH format: 2.4.0 
	index.tmp: xmlfrag 

	:( 

	What should I do to make that swish-e can include the file specified in 
	'ENTITY xxx SYSTEM "somefile"' in indexed xml files?? 

	Thanks in advance for your help!! 

	Edgard Pineda. 
Received on Wed Sep 8 15:28:12 2004