Skip to main content.
home | support | download

Features planned for 3.0

Swish-e 3.0 (abbreviated Swish3) will be a complete overhaul of the code. You can track development progress here. Major feature improvements will include:

Unicode support
Unicode is the international standard for character encodings. Swish3 will implement support for the UTF-8 character encoding, which should handle all major languages in the world (UTF-8 handles up to 2,147,483,648 unique characters). The Swish-e developers need input from non-English language experts. Please contribute to the discussion at the Swish-e mailing list. Some significant known issues include:

lowercase vs. UPPERCASE
Version 2.x uses tolower() to lowercase all characters before searching and indexing. Should the same approach be used for UTF-8? Will this have significant impact on usability for non-English languages?
Wildcards
Version 2.x uses an internal table to support wildcard searching with *. The table assumes 8-bit (non-Unicode) character encoding. That approach will likely need to be re-thought for multibyte encodings like UTF-8.
Tokenizing
Version 2.x uses 5 different configuration options to control how a 'word' (token) is defined. The basic assumption is that a word is defined by which characters it includes. That assumption is based on a manageable character set of 256 characters. However, the sheer size of UTF-8 makes that system unworkable. Instead, some kind of regular expression library will likely be used.
Stemming
The stemmers used will need full international support.
Configuration format
Since Swish-e depends on a configuration file for StopWords, Character definitions, etc., the parsing of the configuration file must support UTF-8 as well. The current idea is to switch to XML-style configuration files and use Libxml2 to parse them.
Incremental indexing
Swish3 will support true incremental indexing. This will allow for document records to be modified, added and deleted in an existing index. This feature may or may not build on the version 2.x experimental btree/incremental feature.
Scaling
Swish3 will reliably scale to larger (multimillion) document collections.
Indexing API
Swish3 will include an indexing API in addition to the current searching API.
Streamlined feature set
Swish3 will not contain several features in the current version:
  • Expat parsers
  • -S http indexing method and related configuration options
  • Older stemmers
  • Current native index format
Alternate index backends
Swish3 will offer alternate index backends using available open source libraries, such as Xapian, HyperEstraier, Lucene, or Lemur.