Skip to main content.
home | support | download

Back to List Archive

Re: Filtering MS Word Documents

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Tue Oct 18 2005 - 20:45:34 GMT
On Tue, Oct 18, 2005 at 12:24:17PM -0700, Sebastian Jayaraj wrote:
> ++Checking filter [SWISH::Filters::Doc2txt=HASH(0x835f794)] for 
> application/x-msword
> ++ application/x-msword was not filtered by 
> SWISH::Filters::Doc2txt=HASH(0x835f794)

Odd.  Doc2txt sets this:

   mimetypes   => [ qr!application/(x-)?msword! ]

which should match your content type.  Again, here's my attempt:

    >> Starting to process new document: application/x-msword
     ++Checking filter [SWISH::Filters::Doc2txt=HASH(0x8432290)] for application/x-msword
     ++ application/x-msword *WAS* filtered by SWISH::Filters::Doc2txt=HASH(0x8432290)

Are you using some silly mail client that decides to wrap text when
you don't want it to?  Or does your mime type actually have a newline?

Regardless, you can see that it should be matching.  So add a few
print statements for debugging:

In Filter.pm the mime type is passed to the can_filter_mimetype
method.  Throw in some print statements to see what mimetype and
patter and really being checked.  For starters:

sub can_filter_mimetype {
    my ( $self, $content_type ) = @_;

    die "Must supply content_type to can_filter_mimetype()" unless $content_type;
    for my $pattern ( $self->mimetypes ) {

warn "checking if content type [$content_type] matches pattern [$pattern]\n";
warn " and it ", ($content_type =~ /$pattern/ ? 'matches' : 'does not match'), "\n";
        return $pattern if $content_type =~ /$pattern/;
    }
    return;
}



-- 
Bill Moseley
moseley@hank.org

Unsubscribe from or help with the swish-e list: 
   http://swish-e.org/Discussion/

Help with Swish-e:
   http://swish-e.org/current/docs
   swish-e@sunsite.berkeley.edu
Received on Tue Oct 18 13:45:36 2005