Skip to main content.
home | support | download

Back to List Archive

Re: Indexing PDFs on Windows - Revisited....

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Sat Sep 25 2004 - 13:25:02 GMT
On Fri, Sep 24, 2004 at 11:32:26PM -0700, Anthony Baratta wrote:
> 
> I rummaged through the code and discovered someone had kindly added
> 
> $self->{pid} = $pid;
> 
> in the windows_fork of Filter.pm. But I didn't find any references to 
> "waitpid".

Ah, we have been through this before.  waitpid is called in swish.cgi
and I thought it got added to Filter.pm after this discussion:

  http://swish-e.org/Discussion/search/swish.cgi?query=%22thread+safety%22&submit=Search%21&metaname=swishtitle&sort=swishlastmodified

Looks like the discussion didn't finish.

> In "$filter_sub = sub { ... " (Approx. line 1051 in spider.pl), I added 
> "waitpid($doc->{pid},0);" just after "my $doc = $filter->convert( .." 
> and before "return 1 unless $doc;"

But that's won't really work because, as in the case of the PDF
filter, two programs are being run.

The correct solution is to make the call to windows_fork (the call to
IPC::Open2) return an object and then have a DESTROY function that
calls waitpid.

Another way might be to save all the PIDs.  So in the windows_form()
function:

    push @{$self->{pid}}, $pid;

Then in convert()

        eval {
            local $SIG{__DIE__};
            $filtered_doc = $filter->filter($doc_object);
        };

        # clean up Windows process table
        if ( ref $doc_object->{pid} ) {
            waitpid $_, 0 for @{ $doc_object->{pid} };
            delete $doc_object->{pid};
        }

Can you test that one?  I'm not sure how long it takes to test --
maybe you could create a list of links to a bunch of small PDFs on
your local machine so it will run fast.

Or, if you can figure out how to use Win32::Process and avoid
IPC::Open2 completely.

I wonder what happens on Win98.  I thought I tried there once and $pid
was always the same number.



-- 
Bill Moseley
moseley@hank.org

Unsubscribe from or help with the swish-e list: 
   http://swish-e.org/Discussion/

Help with Swish-e:
   http://swish-e.org/current/docs
   swish-e@sunsite.berkeley.edu
Received on Sat Sep 25 06:25:20 2004