Skip to main content.
home | support | download

Back to List Archive

new version of swish-e-1.3.2-PHRASE (m)

From: Jose Manuel Ruiz <jmruiz(at)not-real.boe.es>
Date: Thu Jun 08 2000 - 18:52:23 GMT
Hi all,

Sorry for the delay, here is my last try...
swish-e-1.3.2-PHRASEm.tar.gz
Download it from http://www.boe.es/swish-e

By demand, here is all the features added from the first version.

First, the good news:

New general features:
- Faster index and retrieval of douments (wildcard
search outperforms old one).A hash approach has been added for speed up
searches. This reduces disk i/o. Now you can search for things like
 "a* or b* or c* or d* or e* ..." without the penalty of reading the
linked list
 for each word of the expanding list.

- Better use of memory. Lots of calls to free memory have been added.

- Phrase search. Example:
swish-e -w "John Smith" -f index.file
(Use " to delimite the phrase).

- XML MetaNames style. Example: <metaname1>SomeText</metaname1>
Nested XML Metanames are allowed:
<metaname1>
SomeText
<metaname2>
MoreText
</metaname2>
SomeText
</metaname1>

- Other options like filtering and some patches from different
people have been added. (See previous messages).

- Better compression of numbers.

- Portable index file.


----------------------------------------
New features in config file:
- New directive TranslateCharacters to translate some characters in
the words. It takes two strings: The original characters and 
the translated characters.
Example:

TranslateCharacters - aa/

This makes word "rea" indexed as "area" and "9-1" as "9/1"
Remember that all the chars int these strings must also be in
WordCharacters.
This option is useful for non english languages.

- Special word in MetaNames. If you specify automatic in
MetaNames directive, the indexer will try to extract all the MetaNames
dinamically. This option only works with these types of MetaNames:

<metaname>someContent</metaname>

and

<!-- META START NAME="keyName" --> someContent <!-- META END -->

(Nested MetaNames are allowed!!)

Sorry, it does not support:
<META NAME="keyName" CONTENT="someContent">

----------------------------------------
New search options:
- Option -s to sort results by one or more document properties 
(those specified in PropertyNames in config file). 
(always descending)
Example:

swish-e -w test -f index.file -s cod aut

This will sort results by properties cod aut.

- Option -b to display results from the number specified up to the
number specified in -m.
Example:

swish-e -b 10 -m 5 -w test -f index.file 

This will show 5 results starting at 10th position
-----------------------------------------

New decompress option:
- Option -D shows more information


And now, the bad news:

- This version uses more memory than old swish-e. Like swish-e-1.3.2,
it stores all the data (words, files, properties, metanames) in memory
during the index proccess. But, now it also stores all the word
positions
in memory during the index process (positions are required for phrase
search).
- Be careful using IgnoreLimit directive in config file. With this
option
you can get "Automatic" stopwords and remove them from the index
file. The problem is that this feature is executed at the end of the
index
proccess. So, if an automatic word is found, all the word positions must
be
recomputed increasing the index time (this a pure memory-cpu process).
It is 
better to add these words in the IgnoreWords directive.


Thanks to all of you for your help: Bill Moseley,
Andrew Linn, SRE, Roy Tennant, David Norris and
many others I could not remember now (sorry).


To do:
I am open to any suggestion.


Have a nice day 

Jose Ruiz

jmruiz@boe.es
Received on Thu Jun 8 14:58:27 2000