Skip to main content.
home | support | download

Back to List Archive

Re: Swish-e Filtering on Win2003

From: Philippus, Brian <BPhilippus(at)not-real.nevp.com>
Date: Thu Mar 18 2004 - 18:03:43 GMT
BTW, I have tried this on two servers, and get the same problem on both
(probably because I did the same thing).

Line 348 (with some trailing lines) of Handle.pm is 
   if (ref($fd) && "".$fd =~ /GLOB\(/o) {
	# It's a glob reference; Alias it as we cannot get name of anon
GLOBs
	my $n = qualify(*GLOB);
	*GLOB = *{*$fd};
	$fd =  $n;
    } elsif ($fd =~ m#^\d+$#) {
	# It's an FD number; prefix with "=".
	$fd = "=$fd";
    }
Line 358 (with a trailing line) of Handle.pm is
	open($io, _open_mode_string($mode) . '&' . $fd)
	? $io : undef;
Line 338 of open3.pm is
	$fd->{tmp_copy}->close or croak "Can't close: $!";

Not really understanding Perl, I put in a few print statements and the code
wouldn't run.  I'd be happy to put in any print statement anyone can suggest
that might help narrow the problem.

My spider.pl's config file  (Everything else had been commented out and it
still generates the error.)
@servers = (
    {
        skip        => 0,         # Flag to disable spidering this host.
        base_url    =>
'http://staging.nevp.com/fp/PDF/2004_dibetes_wllnss_day.pdf',
        same_hosts  => [ qw/mercury.nevp.com/ ],
        agent       => 'swish-e spider http://swish-e.org/',
        email       => 'myaddress@myemail.com',
        debug       => DEBUG_URL | DEBUG_SKIPPED | DEBUG_HEADERS,
        filter_content  => \&filter_content,        
    },
);    
my $filter;  # cache the object.

sub filter_content {
    my ( $uri, $server, $response, $content_ref ) = @_;

    # Uncomment this to enable debugging of SWISH::Filter
     $ENV{FILTER_DEBUG} = 1;

    my $content_type = $response->content_type;

    # Ignore text/* content type -- no need to filter
    return 1 if !$content_type || $content_type =~ m!^text/!;
    

    # Load the module - returns FALSE if cannot load module.
    unless ( $filter ) {
        eval { require SWISH::Filter };
        if ( $@ ) {
            $server->{abort} = $@;
            return;
        }
        $filter = SWISH::Filter->new;
        unless ( $filter ) {
            $server->{abort} = "Failed to create filter object";
            return;
        }
    }

    # If not filtered return false and doc will be ignored (not indexed)
    
    my $doc = $filter->convert(
        document => $content_ref,
        name     => $response->base,
        content_type => $content_type,
    );
    return unless $doc;
    # return unless $doc->was_filtered # could do this since checking for
text/* above
    return if $doc->is_binary;

    $$content_ref = ${$doc->fetch_doc};

    # let's see if we can set the parser.
    $server->{parser_type} = $doc->swish_parser_type || '';

    return 1;
}

## Must return a true value!!

1;

My swish-e conf file (Again, I took out everything I could, yet still have
it run and generate the error)
IndexFile d:\SWISH-E\indicies\mercury.index

# Specify the URL (or URLs) to index:
    IndexDir ./spider.pl
    SwishProgParameters mercuryConfig.pl


-----Original Message-----
From: swish-e@sunsite.berkeley.edu [mailto:swish-e@sunsite.berkeley.edu] On
Behalf Of Bill Moseley
Sent: Thursday, March 18, 2004 9:41 AM
To: Multiple recipients of list
Subject: [SWISH-E] Re: Swish-e Filtering on Win2003

On Thu, Mar 18, 2004 at 08:57:15AM -0800, Philippus, Brian wrote:
> I'm sorry, I guess my email got mutilated, I'll try again.

Oh, and I forgot to read all my mail before responding...

> ++Checking filter [SWISH::Filters::Pdf2HTML=HASH(0x1ea2d60)] for
> application/pdf
> 5592 Warning - http://staging.nevp.com/fp/PDF/2004_dibetes_wllnss_day.pdf:
> Use of uninitialized value in pattern match (m//) at
> D:/Perl/lib/IO/Handle.pm line 348.
> 5592 Warning - http://staging.nevp.com/fp/PDF/2004_dibetes_wllnss_day.pdf:
> Use of uninitialized value in concatenation (.) or string at
> D:/Perl/lib/IO/Handle.pm line 358.
> Problems with filter 'SWISH::Filters::Pdf2HTML=HASH(0x1ea2d60)'.  Filter
> disabled:
>  -> open2: Can't call method "close" on an undefined value at
> D:/Perl/lib/IPC/Open3.pm line 338.

Interesting.  That's not really giving any hints without going into
those line and look what's happening.  Careful use of "print" at those
locations will like give some clues. 

Can you provide some sample files and a small sample config?  Maybe
others running on Windows can help test.


-- 
Bill Moseley
moseley@hank.org
Received on Thu Mar 18 10:03:44 2004