Swish 2.4.3 Pre Release 1 is now on the Swish-e.org download page:
http://swish-e.org/Download/
It would be a huge help if you could try it out. This version is NOT
compatible with indexes created with previous versions of swish. That
means you will need to reindex your documents to test.
Thanks to all swish-e users that have contributed patches and reported
issues with swish.
Here's a list of changes from the CHANGES file. Those that know me
will know to look for spelling errors....
--------- Changes since 2.4.2 ------------------------
Version 2.4.3-pr1 - Wed Dec 1 09:52:50 PST 2004
"Fixed" libxml2's change in UTF8Toisolat1() return value
Bernhard Weisshuhn supplied a patch to parser.c for checking the
return value of UTF8Toisolat1(). Seems that libxml2 now returns the
number of characters converted instead of zero for success.
http://bugzilla.gnome.org/show_bug.cgi?id=153937
Added swish-config and pkg-config
Swish now provides a swish-config script and config file for the
pkg-config utility. These tools help when building programs that
link with the swish-e library.
The SWISH::API Makefile.PL program uses swish-config to locate the
installation directory of swish-e. This should make building
SWISH::API easier when swish-e is installed in a non-standard
location.
Fixed rank bias in merge
Peter van Dijk noticed that MetaNamesRank settings were not being
copied to the output index when merging.
Added SwishFuzzy function
SwishFuzzy function (SWISH::API::Fuzzy) lets you stem a word without
first searching. This might be helpful for playing with queries
prior to the search.
Fixed translate character table
Michael Levy found an error in the table used to translate 8859-1 to
ascii7. Luckily, it was an upper case translation and the table is
only used on lower case characters.
MetaNamesRank documentation
Changed the 'not yet implemented' caveat to 'implemented but
experimental'.
Added Continuation option to config processing
You can now use continuation lines in the config file:
IgnoreWords \
the \
am \
is \
are \
was
There may not be any characters following the backslash.
Fixed Buzzwords (and other word lists entered in the config)
Words entered in config were not converted to lower case before
storing in the index.
Fixed metaname mapping problem in Merge
Peter Karman found an error when merging indexes where the source
indexes had the same metanames, but listed in a different order in
their config files. Words would then be indexed under the wrong
metaID number in the output index.
SWISH::Filters and spider.pl updates
The web spider spider.pl was updated to work better with
SWISH::Filter by default and also make it easier to use the spider
default along with a spider config file. See spider.pl for details.
SWISH::Filter was updated. The way filters are created has changed.
If you created your own filters you will need to update them. Take a
look at SWISH::Filter and the filters included in the distribution.
Updates to Documentation
Richard Morin submitted formatting and punctuation dates to the
README and INSTALL docs.
Added -R option to support IDF word weighting in ranking. (karman)
Added Inverse Document Frequency calculation to the getrank()
routine. This will allow the relative frequency of a word in
relationship to other words in the query to impact the ranking of
documents.
Example: if 'foo' is present twice as often as 'bar' in the
collection as a whole, a search for 'foo bar' will weight documents
with 'bar' more heavily (i.e., higher rank) than those with 'foo'.
The impact is greatest when OR'ing words in a query rather than
AND'ing them (which is the default).
Also added Rank discussion to the FAQ.
Updates to the example scripts
Updated PhraseHighlight.pm as suggested by Bill Schell for an
optimization when all words in a document are highlighted.
Updated search.cgi and PhraseHighlight.pm to use the internal
stemmers via the SWISH::API module as suggested by Jonas Wolf.
Leak when using C library
David Windmueller found a memory leak when calling multiple searches
on a swish handle. The problem was swish loading the pre-sorted
property index on every search, even after the table had been loaded
into memory.
Swish.cgi now kills swish-e on time out
The example script swish.cgi uses an alarm (on platforms that
support alarm) to abort processing after some number of seconds, but
it was not killing the child process, swish-e. Bill Schell submitted
a patch to kill the child when the alarm triggers.
The template search.tt was renamed to swish.tt
The template was renamed because it's used by swish.cgi, not by
search.cgi, which was confusing.
Updates to the search.cgi
The example script search.cgi was updated to work better with
mod_perl and to use external template files and style sheets.
New MS Word Filter
James Job provided the SWISH::Filter::Doc2html filter that uses the
wvWare (http://wvware.sourceforge.net/) program for filtering MS
Word documents. If both catdoc and wvWare are installed then wvWare
will be used.
wvWare is reported to do a good job at converting MS Word docs to
HTML. In a few tests it did work well, but other cases it failed to
generate correct output. It was also much, much slower than catdoc.
I tested with wvWare 0.7.3 on Debian Linux. Testing with both is
recommended.
Change in way symbolic links are followed
John-Marc Chandonia pointed out that if a symlink is skipped by
FileRules, then the actual file/directory is marked as "already
seen" and cannot be indexed by other links or directly.
Now, files and directories are not marked "already seen" until after
passing FileRules (i.e after a file is actually indexed or a
directory is processed).
Could not set SwishSetSort() more than once
David Windmueller found a problem when trying to set the sort order
more than once on an existing search object. Memory was not
correctly reset after clearing the previous sort values.
Access MetaNames and PropertyNames from API
Patch provided by Jamie Herre to access the MetaNames and
PropertyNames via the C API and to test via the testlib program.
Swish::API also updated to access this data.
SwishResultPropertyULong() bug fixed
David Windmueller reported that SwishResultPropertyULong() was
returning ULONG_MAX on all calls. This was fixed.
Null written to wrong location in file.c
Bill Schell with the help of valgrind found a null written past the
end of a buffer in file.c in the code that supports the old parsers.
This resulted in a segfault while indexing a large set of XML
documents.
Fixed problem when indexing very large files
Steve Harris reported a problem when indexing a very large document
that caused an integer overflow. José Ruiz updated to used unsigned
integers.
Bump word position on block tags with HTML2 parser
Peter Karman pointed out the the libxml2 HTML parser was allowing
phrase matches across block level html elements. Swish now bumps the
word position on these elements.
--
Bill Moseley
moseley@hank.org
Unsubscribe from or help with the swish-e list:
http://swish-e.org/Discussion/
Help with Swish-e:
http://swish-e.org/current/docs
swish-e@sunsite.berkeley.edu
Received on Thu Dec 2 10:37:58 2004