Skip to main content.
home | support | download

Back to List Archive

RE: & Windows Thread safety

From: Job, James <JJob(at)not-real.ESD.WA.GOV>
Date: Mon May 24 2004 - 17:33:07 GMT

Sorry, I should be more precise... 
I SUSPECT a SMP issue. 

What I observed that the filter child process (CATDOC.EXE or PDF2HTML.EXE
were not properly terminating, and I could still see them idle in Task

My first attempt was to simply add a "WAITPID(-1,&WNOHANG)" call in order to
have PERL standby while the child finishes and exits properly.

I found that at times this would occasionally stop the spider permanently
(the child wouldn't exit on it's own). 
So- adding the 10 second timeout seemed to do the trick, and subsequent
crawls did not fail. 
I'm running a Dual Opteron 240 SMP/NUMA system running Windows 2003 Standard
Server(32bit) (2gb memory, split 1GB per CPU).

This link may explain the concern better: 
Basically, it appears that open2() does NOT WAIT for, nor reap the child
process after it terminates.  They recommend you use waitpid to prevent
zombie processes.  But, since you can't wait forever, we implement a
reasonable counter and kill the process if it doesn't finish.  Perhaps
adding WARN line to the output if we kill would let us better observe the
flow (and see if the document affected was indexed at all).
I am no Perl expert (only been doing this a few days now), but it does
appear to work.  We're just waiting for the filter executable to complete-
and killing it if it doesn't after 10 seconds.  The output is being indexed,
so the output pipe is accessible after the process terminates.

Before merging the code in, I think it would be important to validate it on
other systems.  It may be a good hack for SMP/NUMA on Win32, but I have no
idea how it will behave on UP and non NUMA systems (test, test, test).
Fortunately, the change only affects Windows systems.

My ZLIB problem is on hold for the moment.  I shifted from Suse Enterprise
for AMD64 to Win2003/32bit in order to get SWISH-E up and running quickly
(have a deadline to meet), but I won't forget about the 64bit/Linux Opterons
though...  Fortunately, I've got a couple Opterons set aside for test & dev,
so I can return to it when I have time.

I seem to remember seeing some discussion in the archives about filter
processes stalling in the past (only getting 64 .doc(s) or something like
that).  People experiencing those problems may want to try this hack.  I had
a similar result, only getting 23 docs and 55 pdfs indexed from my stock
crawls.  I now get 55 docs and 335 pdfs (and no skipped due to filter

Hope this helps. 
James Job 
-----Original Message----- 
From: Bill Moseley [] 
Sent: Saturday, May 22, 2004 10:12 PM 
To: Job, James 
Cc: Multiple recipients of list 
Subject: Re: & Windows Thread safety 

Hi James, 
On Sat, May 22, 2004 at 08:41:16PM -0700, Job, James wrote: 
> However, about 25-50 documents into my crawl, I'd start seeing 
> "Skipped whatever.doc due to filter 'filter_content' user supplied 
> function #1. 
So the filter started to fail. 

> Looking at task manager, I would see a running "catdoc" or "pdftotext" 
> process.  After tearing my hair out for a while, I suspected there may 
> be a threading issue (since I'm running a SMP system), 
I don't know anything about SMP or threaded applications.  Can you explain
why just having two CPUs would result in such a problem?
> and made some changes to 
> the windows_fork subroutine in  I eventually had success 
> with the 
> following: 
Good.  I'll apply the patch, but I'd like to understand what's happening. 
>     my $pid = IPC::Open2::open2($rdrfh, $wtrfh, @command ); 
>     # --- BEGIN WIN32 SMP MODS 
>     # Wait for Process to complete before we continue (max 10 sec), else
kill it! 
>     use POSIX ":sys_wait_h"; 
>     my ($stiff, $tcks); 
>     $tcks = 0; 
>     while (($stiff=waitpid(-1,&WNOHANG))>0 && $tcks<9) { 
>       sleep 1; 
>       $tcks++; 
>       } 
>     if ($tcks>8) { 
>       $pid->Kill(9); 
>       } 
>     # --- END WIN32 SMP MODS 
OK, so is that waiting on the just run program?  Seems like would want to do
that after reading from the pipe.  I would think the OS would block the
program until the pipe was read from -- so it would always get killed.
Or is it too late and I'm missing something obvious? 

BTW -- what ever happened with your other problem: 
  Warning: Failed to uncompress Property. zlib uncompress returned: -5. 
  uncompressed size: 140 buf_len: -1073746392 
Did that go away after reindexing? 

Bill Moseley 

Due to deletion of content types excluded from this list by policy,
this multipart message was reduced to a single part, and from there
to a plain text message.
Received on Mon May 24 10:33:08 2004