Skip to main content.
home | support | download

Back to List Archive

(no subject)

From: Revillini, James <JRevillini(at)not-real.txcc.commnet.edu>
Date: Thu Jun 30 2005 - 22:42:30 GMT
Since unrtf causes errors that go into the application log on my server
and takes a long time to parse rtfs, I'm working on a new rtf2htm perl
module.  It uses rtf2htm.php.  The link to the program is in the code
near the bottom if you'd like to test. =20

=20

The issue I get when running swish-filter-test or Dirtree.pl is that
when it hits an rtf document, it tries to kick in and then outputs the
following error:=20

=20

Could not open input file: rtf2htm.php

=20

If anyone has a time to compare notes with me, it would be great.  I'm
on a win2k3 box.

=20

Here is the pm code (I modified this from Rtf2html.pm):

=20

package SWISH::Filters::Rtf2htm;

use strict;

=20

use vars qw/ $VERSION /;

=20

=20

$VERSION =3D '0.02';

=20

=20

sub new {

    my ( $class ) =3D @_;

=20

    my $self =3D bless {

        mimetypes   =3D> [ qr!(text|application)/rtf! ], # list of types
this filter handles

    }, $class;

=20

    return $self->set_programs( 'rtf2htm' );

}

=20

sub filter {

    my ( $self, $doc ) =3D @_;

=20

    # Grab output from running program

    my $content =3D $self->run_rtf2htm( $doc->fetch_filename ) || =
return;

=20

    # update the document's content type

    $doc->set_content_type( 'text/html' );

=20

    # return the document

    return \$content;

}

1;

=20

__END__

=20

=3Dhead1 NAME

=20

SWISH::Filters::Rtf2htm - Perl extension for filtering RTF documents

with Swish-e

=20

=3Dhead1 DESCRIPTION

=20

This is a plug-in module that uses the rtf2htm program to convert RTF

documents to HTML for indexing by Swish-e.  rtf2htm can be downloaded
from:

=20

    http://www.penguin.cz/~martinmv/index_eng.html

=20

The program rtf2htm must be installed and in your PATH before running
Swish-e.

=20

Tested with windows 2003, php 5.0.3 and rtf2htm 3.5

=20

=20

=3Dhead1 SEE ALSO

=20

L<SWISH::Filter>

=20

=20

=20

=20




*********************************************************************
Due to deletion of content types excluded from this list by policy,
this multipart message was reduced to a single part, and from there
to a plain text message.
*********************************************************************
Received on Thu Jun 30 15:42:43 2005