Hi James,
On Sat, May 22, 2004 at 08:41:16PM -0700, Job, James wrote:
> However, about 25-50 documents into my crawl, I'd start seeing "Skipped
> whatever.doc due to filter 'filter_content' user supplied function #1.
So the filter started to fail.
>
> Looking at task manager, I would see a running "catdoc" or "pdftotext"
> process. After tearing my hair out for a while, I suspected there may be a
> threading issue (since I'm running a SMP system),
I don't know anything about SMP or threaded applications. Can you
explain why just having two CPUs would result in such a problem?
> and made some changes to
> the windows_fork subroutine in Filter.pm. I eventually had success with the
> following:
Good. I'll apply the patch, but I'd like to understand what's
happening.
> my $pid = IPC::Open2::open2($rdrfh, $wtrfh, @command );
>
> # --- BEGIN WIN32 SMP MODS
> # Wait for Process to complete before we continue (max 10 sec), else kill it!
> use POSIX ":sys_wait_h";
> my ($stiff, $tcks);
> $tcks = 0;
> while (($stiff=waitpid(-1,&WNOHANG))>0 && $tcks<9) {
> sleep 1;
> $tcks++;
> }
> if ($tcks>8) {
> $pid->Kill(9);
> }
> # --- END WIN32 SMP MODS
OK, so is that waiting on the just run program? Seems like would want
to do that after reading from the pipe. I would think the OS would
block the program until the pipe was read from -- so it would always get
killed.
Or is it too late and I'm missing something obvious?
Thanks,
BTW -- what ever happened with your other problem:
Warning: Failed to uncompress Property. zlib uncompress returned: -5.
uncompressed size: 140 buf_len: -1073746392
Did that go away after reindexing?
--
Bill Moseley
moseley@hank.org
Received on Sat May 22 22:12:53 2004