Since unrtf causes errors that go into the application log on my server
and takes a long time to parse rtfs, I'm working on a new rtf2htm perl
module. It uses rtf2htm.php. The link to the program is in the code
near the bottom if you'd like to test. =20
=20
The issue I get when running swish-filter-test or Dirtree.pl is that
when it hits an rtf document, it tries to kick in and then outputs the
following error:=20
=20
Could not open input file: rtf2htm.php
=20
If anyone has a time to compare notes with me, it would be great. I'm
on a win2k3 box.
=20
Here is the pm code (I modified this from Rtf2html.pm):
=20
package SWISH::Filters::Rtf2htm;
use strict;
=20
use vars qw/ $VERSION /;
=20
=20
$VERSION =3D '0.02';
=20
=20
sub new {
my ( $class ) =3D @_;
=20
my $self =3D bless {
mimetypes =3D> [ qr!(text|application)/rtf! ], # list of types
this filter handles
}, $class;
=20
return $self->set_programs( 'rtf2htm' );
}
=20
sub filter {
my ( $self, $doc ) =3D @_;
=20
# Grab output from running program
my $content =3D $self->run_rtf2htm( $doc->fetch_filename ) || =
return;
=20
# update the document's content type
$doc->set_content_type( 'text/html' );
=20
# return the document
return \$content;
}
1;
=20
__END__
=20
=3Dhead1 NAME
=20
SWISH::Filters::Rtf2htm - Perl extension for filtering RTF documents
with Swish-e
=20
=3Dhead1 DESCRIPTION
=20
This is a plug-in module that uses the rtf2htm program to convert RTF
documents to HTML for indexing by Swish-e. rtf2htm can be downloaded
from:
=20
http://www.penguin.cz/~martinmv/index_eng.html
=20
The program rtf2htm must be installed and in your PATH before running
Swish-e.
=20
Tested with windows 2003, php 5.0.3 and rtf2htm 3.5
=20
=20
=3Dhead1 SEE ALSO
=20
L<SWISH::Filter>
=20
=20
=20
=20
*********************************************************************
Due to deletion of content types excluded from this list by policy,
this multipart message was reduced to a single part, and from there
to a plain text message.
*********************************************************************
Received on Thu Jun 30 15:42:43 2005