Skip to main content.
home | support | download

Back to List Archive

Re: PowerPoint module for

From: Bill Moseley <moseley(at)>
Date: Thu Jul 08 2004 - 15:20:18 GMT
On Thu, Jul 08, 2004 at 06:51:23AM -0700, Alan Ivey wrote:
> I edited like you showed, and now I'm
> trying to write a There isn't a binary
> that converts ppt to txt, but rather html (ala
> ppthtml). The only problem is, the <TITLE/> is the
> full filename and path, which, with SWISH-E, makes it
> like /tmp/sddwt4g490 or whatever.

Does ppthtml extract a valuable title from the document?

> I know I can pipe the output through w3m with some
> options to strip the HTML tags to make it text, but
> I'm having a hard time figuring out how to make it
> work in a module. Using the as an example,
> I tried about 20 different things I was hoping would
> work but no luck. 

Do you even need to strip the HTML?  Just let swish-e do it with its
html parser.

> How would I change the line...
> my $content = $filter->run_program( $self->{ppthtml},
> $file )
> To do the bash equivilent of...
> ppthtml [filegoeshere] | w3m -dump -T text/html | perl
> -pe 's/\xa0/ /g'
> ?

If you really need to do that then there's a few ways.  First, the
filter could flag it as a new content type and also say the filtering
is not complete and the use a secondary filter to strip the html.

Another way would be to write the content (or the output of ppthtml)
to a file and then use another run_program() line to process it again.

Or you can just use a shell call, either backticks or system().

(I didn't try these):

    $content = `ppthtml $file | w3m -dump -T text/html | perl -pe 's/\xa0/ /g'`;
    system("ppthtml $file | w3m -dump -T text/html | perl -pe 's/\xa0/ /g' > outfile")

and then read the file back in.

I would likely not do either of those -- I try to avoid the shell for
security reasons.

Bill Moseley

Unsubscribe from or help with the swish-e list:

Help with Swish-e:
Received on Thu Jul 8 08:20:32 2004