Skip to main content.
home | support | download

Back to List Archive

[swish-e] ANNOUNCE: new SWISH::HiLiter, HTML::HiLiter and Search::Tools on CPAN

From: Peter Karman <peter(at)>
Date: Sat Sep 26 2009 - 15:47:21 GMT
Some on this list use the SWISH::HiLiter CPAN module and its cousins,
HTML::HiLiter and Search::Tools.

New versions of each are on or on their way to CPAN.

The APIs for SWISH::HiLiter and HTML::HiLiter have changed. Both are now much
smaller modules and delegate the heavy lifting to Search::Tools. If you use any
of these modules in your code, please read the Changes and documentation notes
carefully as you will likely need to modify your existing code.

>From the Changes for all of them:


0.05  26 Sep 2009
    * complete rewrite to use Search::Tools
    * The API has changed. Read the docs, esp the SYNOPSIS.


* 0.14  26 Sep 2009
    * rewrite to use Search::Tools. At the same time considered replacing
      HTML::Parser with XML::LibXML for speed reasons, but when comparing
      the RT queues for both, it became obvious that HTML::Parser was a much
      safer route. That, and I couldn't get tests in XML::LibXML to pass
      against libxml2 2.7.
    * The API has changed. Read the SYNOPSIS.
    * since Search::Tools normalizes everything to UTF-8, the output of
      HTML::HiLiter will always be UTF-8.  As a convenience, if the HiLiter
      encounters a http-equiv meta charset tag of anything other than
      ascii or utf-8, a new meta tagset will be inserted in its place
      indicating utf-8 encoding. If you really do not want to display UTF-8,
      you'll need to convert back to your desired encoding, using something
      like the Encode module.


0.24  19 Sept 2009
    * thanks to Henry at zen for prompting the bug fixes and improvements
      in this release.
    * fix Data::Dump calls from pp() to fully-qualified.
    * Snipper->snip() will always return UTF-8 encoded text.
    * rename Snipper methods snipper_name, snipper_force and snipper_type
      to type_used, force and type.
    * document Snipper->type().
    * fix some off-by-one errors in all the snip() algorithms
    * fix the debugging code in Snipper
    * add sanity check fallback to plain() hiliter to persevere if plain
      regex obviously fails.
    * add ignore_fields feature
    * add treat_uris_like_phrases feature
    * RegExp, RegExp::Keywords, RegExp::Keyword and Keywords are all
      deprecated in favor of the new, tidier and cleaner QueryParser,
      Query and RegEx classes. Backwards compatibility is preserved
      for existing code, but users should move to the new API as
      documented in Search::Tools.
      RegExp will carp every time you build() with it.
    * added new Tokenizer, Token and TokenList XS code for must faster snipping.
    * added PP versions of tokenizing code, both for benchmarking and
      comparision. As expected, XS is much faster. The extra speed makes it
      possible to be more accurate in snippet extraction without sacrificing

0.25  19 Sept 2009
    * add missing $VERSION back to to satisfy CPAN

0.26  23 Sept 2009
    * fix a couple of Perl::Critic warnings (trivial imo)
    * fix repos and homepage links in Makefile.PL
    * fix a couple of regex escape bugs in HiLiter
    * fix an innocuous bug in Object that passed extra args to
      QueryParser->new in _normalize_args
    * add \002/\003 no-hiliting marker support in HiLiter
      (for HTML::HiLiter)
    * HiLiter->light() now returned UTF-8 encoded text like
      Snipper->snip() does.
    * fix regex build bug where phrase could be separated by multiple
      whitespace chars.

Peter Karman  .  .  peter(at)
Users mailing list
Received on Sat Sep 26 11:47:25 2009