Skip to main content.
home | support | download

Back to List Archive

Swish-e 2.4.3 Pre Release Available

From: Bill Moseley <moseley(at)>
Date: Thu Dec 02 2004 - 18:37:57 GMT
Swish 2.4.3 Pre Release 1 is now on the download page:

It would be a huge help if you could try it out.  This version is NOT
compatible with indexes created with previous versions of swish.  That
means you will need to reindex your documents to test.

Thanks to all swish-e users that have contributed patches and reported
issues with swish.

Here's a list of changes from the CHANGES file.  Those that know me
will know to look for spelling errors....

--------- Changes since 2.4.2 ------------------------

  Version 2.4.3-pr1 - Wed Dec  1 09:52:50 PST 2004

    "Fixed" libxml2's change in UTF8Toisolat1() return value
        Bernhard Weisshuhn supplied a patch to parser.c for checking the
        return value of UTF8Toisolat1(). Seems that libxml2 now returns the
        number of characters converted instead of zero for success.


    Added swish-config and pkg-config
        Swish now provides a swish-config script and config file for the
        pkg-config utility. These tools help when building programs that
        link with the swish-e library.

        The SWISH::API Makefile.PL program uses swish-config to locate the
        installation directory of swish-e. This should make building
        SWISH::API easier when swish-e is installed in a non-standard

    Fixed rank bias in merge
        Peter van Dijk noticed that MetaNamesRank settings were not being
        copied to the output index when merging.

    Added SwishFuzzy function
        SwishFuzzy function (SWISH::API::Fuzzy) lets you stem a word without
        first searching. This might be helpful for playing with queries
        prior to the search.

    Fixed translate character table
        Michael Levy found an error in the table used to translate 8859-1 to
        ascii7. Luckily, it was an upper case translation and the table is
        only used on lower case characters.

    MetaNamesRank documentation
        Changed the 'not yet implemented' caveat to 'implemented but

    Added Continuation option to config processing
        You can now use continuation lines in the config file:

            IgnoreWords \
                the \
                am \
                is \
                are \

        There may not be any characters following the backslash.

    Fixed Buzzwords (and other word lists entered in the config)
        Words entered in config were not converted to lower case before
        storing in the index.

    Fixed metaname mapping problem in Merge
        Peter Karman found an error when merging indexes where the source
        indexes had the same metanames, but listed in a different order in
        their config files. Words would then be indexed under the wrong
        metaID number in the output index.

    SWISH::Filters and updates
        The web spider was updated to work better with
        SWISH::Filter by default and also make it easier to use the spider
        default along with a spider config file. See for details.

        SWISH::Filter was updated. The way filters are created has changed.
        If you created your own filters you will need to update them. Take a
        look at SWISH::Filter and the filters included in the distribution.

    Updates to Documentation
        Richard Morin submitted formatting and punctuation dates to the
        README and INSTALL docs.

    Added -R option to support IDF word weighting in ranking. (karman)
        Added Inverse Document Frequency calculation to the getrank()
        routine. This will allow the relative frequency of a word in
        relationship to other words in the query to impact the ranking of

        Example: if 'foo' is present twice as often as 'bar' in the
        collection as a whole, a search for 'foo bar' will weight documents
        with 'bar' more heavily (i.e., higher rank) than those with 'foo'.

        The impact is greatest when OR'ing words in a query rather than
        AND'ing them (which is the default).

        Also added Rank discussion to the FAQ.

    Updates to the example scripts
        Updated as suggested by Bill Schell for an
        optimization when all words in a document are highlighted.

        Updated search.cgi and to use the internal
        stemmers via the SWISH::API module as suggested by Jonas Wolf.

    Leak when using C library
        David Windmueller found a memory leak when calling multiple searches
        on a swish handle. The problem was swish loading the pre-sorted
        property index on every search, even after the table had been loaded
        into memory.

    Swish.cgi now kills swish-e on time out
        The example script swish.cgi uses an alarm (on platforms that
        support alarm) to abort processing after some number of seconds, but
        it was not killing the child process, swish-e. Bill Schell submitted
        a patch to kill the child when the alarm triggers.

    The template was renamed to
        The template was renamed because it's used by swish.cgi, not by
        search.cgi, which was confusing.

    Updates to the search.cgi
        The example script search.cgi was updated to work better with
        mod_perl and to use external template files and style sheets.

    New MS Word Filter
        James Job provided the SWISH::Filter::Doc2html filter that uses the
        wvWare ( program for filtering MS
        Word documents. If both catdoc and wvWare are installed then wvWare
        will be used.

        wvWare is reported to do a good job at converting MS Word docs to
        HTML. In a few tests it did work well, but other cases it failed to
        generate correct output. It was also much, much slower than catdoc.
        I tested with wvWare 0.7.3 on Debian Linux. Testing with both is

    Change in way symbolic links are followed
        John-Marc Chandonia pointed out that if a symlink is skipped by
        FileRules, then the actual file/directory is marked as "already
        seen" and cannot be indexed by other links or directly.

        Now, files and directories are not marked "already seen" until after
        passing FileRules (i.e after a file is actually indexed or a
        directory is processed).

    Could not set SwishSetSort() more than once
        David Windmueller found a problem when trying to set the sort order
        more than once on an existing search object. Memory was not
        correctly reset after clearing the previous sort values.

    Access MetaNames and PropertyNames from API
        Patch provided by Jamie Herre to access the MetaNames and
        PropertyNames via the C API and to test via the testlib program.
        Swish::API also updated to access this data.

    SwishResultPropertyULong() bug fixed
        David Windmueller reported that SwishResultPropertyULong() was
        returning ULONG_MAX on all calls. This was fixed.

    Null written to wrong location in file.c
        Bill Schell with the help of valgrind found a null written past the
        end of a buffer in file.c in the code that supports the old parsers.
        This resulted in a segfault while indexing a large set of XML

    Fixed problem when indexing very large files
        Steve Harris reported a problem when indexing a very large document
        that caused an integer overflow. Josť Ruiz updated to used unsigned

    Bump word position on block tags with HTML2 parser
        Peter Karman pointed out the the libxml2 HTML parser was allowing
        phrase matches across block level html elements. Swish now bumps the
        word position on these elements.

Bill Moseley

Unsubscribe from or help with the swish-e list:

Help with Swish-e:
Received on Thu Dec 2 10:37:58 2004