Hello,
we are building very large indexes (> 1 million records).
Input are XML files / streams of XML-files using -S prog -i stdin.
We thought that we can reduce the size of the swish-e index in using
MetaNames. Normally we are using UndefinedMetaTags auto.
We believed that only strings found in XML elements
which are declared by MetaNames will used for indexing.
But swish-e is always indexing all words in all XML elements
(see below).
Is MetaNames really only to limit the search to just the words
contained in that META name?
Is there a way to prevent words from being used for the index by swish-e?
Or do we have to exclude these XML elements from the input files?
Thanks a lot in advance to all people who develop(ed) this wonderful
easy to use and extremly fast tool.
Bye, Uwe
------------------------------------------------------------------
Uwe Dierolf
University of Karlsruhe - University Library
P.O.Box 6920, 76049 Karlsruhe, Germany
phone(fax) : 49/721/608-6076(4886)
www : http://www.ubka.uni-karlsruhe.de/dierolf/
------------------------------------------------------------------
xml-records in separate files
-----------------------------
1.xml
-----
<record>
<id>1</id>
<string>record1</string>
</record>
2.xml
-----
<record>
<id>2</id>
<string>record2</string>
</record>
conf file
---------
IndexDir .
IndexOnly .xml
IndexContents XML2 .xml
IndexFile ./test.index
IndexReport 1
FuzzyIndexingMode None
WordCharacters 0123456789abcdefghijklmnopqrstuvwxyz
BeginCharacters 0123456789abcdefghijklmnopqrstuvwxyz
EndCharacters 0123456789abcdefghijklmnopqrstuvwxyz
MetaNames string
PropertyNames string
index creation: swish-e -c conf
-------------------------------
Indexing Data Source: "File-System"
Indexing "."
Removing very common words...
no words removed.
Writing main index...
Sorting words ...
Sorting 4 words alphabetically
Writing header ...
Writing index entries ...
Writing word text: Complete
Writing word hash: Complete
Writing word data: Complete
4 unique words indexed.
5 properties sorted.
2 files indexed. 126 total bytes. 4 total words.
Elapsed time: 00:00:00 CPU time: 00:00:00
Indexing done!
indexed words: swish-e -f test.index -T INDEX_WORDS_META
---------------------------------------------------------
-----> WORD INFO in index test.index <-----
1 1
2 1
record1 10
record2 10
Received on Tue Dec 14 01:53:49 2004