Some on this list use the SWISH::HiLiter CPAN module and its cousins,
HTML::HiLiter and Search::Tools.
New versions of each are on or on their way to CPAN.
The APIs for SWISH::HiLiter and HTML::HiLiter have changed. Both are now much
smaller modules and delegate the heavy lifting to Search::Tools. If you use any
of these modules in your code, please read the Changes and documentation notes
carefully as you will likely need to modify your existing code.
>From the Changes for all of them:
SWISH::HiLiter:
0.05 26 Sep 2009
* complete rewrite to use Search::Tools
* The API has changed. Read the docs, esp the SYNOPSIS.
HTML::HiLiter:
* 0.14 26 Sep 2009
* rewrite to use Search::Tools. At the same time considered replacing
HTML::Parser with XML::LibXML for speed reasons, but when comparing
the RT queues for both, it became obvious that HTML::Parser was a much
safer route. That, and I couldn't get tests in XML::LibXML to pass
against libxml2 2.7.
* The API has changed. Read the SYNOPSIS.
* since Search::Tools normalizes everything to UTF-8, the output of
HTML::HiLiter will always be UTF-8. As a convenience, if the HiLiter
encounters a http-equiv meta charset tag of anything other than
ascii or utf-8, a new meta tagset will be inserted in its place
indicating utf-8 encoding. If you really do not want to display UTF-8,
you'll need to convert back to your desired encoding, using something
like the Encode module.
Search::Tools:
0.24 19 Sept 2009
* thanks to Henry at zen for prompting the bug fixes and improvements
in this release.
* fix Data::Dump calls from pp() to fully-qualified.
* Snipper->snip() will always return UTF-8 encoded text.
* rename Snipper methods snipper_name, snipper_force and snipper_type
to type_used, force and type.
* document Snipper->type().
* fix some off-by-one errors in all the snip() algorithms
* fix the debugging code in Snipper
* add sanity check fallback to plain() hiliter to persevere if plain
regex obviously fails.
* add ignore_fields feature
* add treat_uris_like_phrases feature
* RegExp, RegExp::Keywords, RegExp::Keyword and Keywords are all
deprecated in favor of the new, tidier and cleaner QueryParser,
Query and RegEx classes. Backwards compatibility is preserved
for existing code, but users should move to the new API as
documented in Search::Tools.
RegExp will carp every time you build() with it.
* added new Tokenizer, Token and TokenList XS code for must faster snipping.
* added PP versions of tokenizing code, both for benchmarking and
comparision. As expected, XS is much faster. The extra speed makes it
possible to be more accurate in snippet extraction without sacrificing
performance.
0.25 19 Sept 2009
* add missing $VERSION back to Keywords.pm to satisfy CPAN
0.26 23 Sept 2009
* fix a couple of Perl::Critic warnings (trivial imo)
* fix repos and homepage links in Makefile.PL
* fix a couple of regex escape bugs in HiLiter
* fix an innocuous bug in Object that passed extra args to
QueryParser->new in _normalize_args
* add \002/\003 no-hiliting marker support in HiLiter
(for HTML::HiLiter)
* HiLiter->light() now returned UTF-8 encoded text like
Snipper->snip() does.
* fix regex build bug where phrase could be separated by multiple
whitespace chars.
--
Peter Karman . http://peknet.com/ . peter(at)not-real.peknet.com
_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Sat Sep 26 11:47:25 2009