Skip to main content.
home | support | download

Back to List Archive

Catdoc - following on from the Open2 problems in Win 2000

From: <Allan_Watts(at)not-real.amp.com.au>
Date: Fri Jan 23 2004 - 04:59:59 GMT
Hi

I'm still experimenting with ways to get around my Open2 problem  ( "
open2: IO::Pipe: Can't spawn-NOWAIT:..." error after reading 64 files).

To recap, waitid didn't solve the problem in Filter.pm for me. I sometimes
get the same NOWAIT error, after reading 65 files (improvement of 1), but
it is erratic - mostly the command prompt session freezes.

However, it did work in this test script (reading a 1000 files called
0000.doc to 0999.doc - collected from my C: drive and copied into a
directory c:\cat):

for (my $k = 0; $k < 1000; $k++)
{
      my $filename = "c:/cat/".substr("0000$k",-4).".doc";
      #my $command = "c:\\data\\swish\\catdoc\\catdoc.exe $filename";  #
version 0.93.3
      my $command = "c:\\progra~1\\swish-e\\lib\\swish-e\\catdoc.exe
$filename"; # Dave's version(?)
      my $pid = IPC::Open2::open2($rdrfh, $wtrfh, "$command" );
      waitpid $pid,0;
      binmode $rdrfh, ':crlf';
      $/ = undef;

      my $content =  <$rdrfh>;
      my $mtime  = (stat $filename)[9];
      my $size = length $content;

      print <<EOF;
Content-Length: $size
Last-Mtime: $mtime
Document-Type: TXT*
Path-Name: $filename

EOF
      print $content;
}


Except for a number of particular files.  I now seem to be getting tangled
up in catdoc/Win32 issues.

I tried two version of catdoc..

The first was the one which came with Swish-e 2.4. (Sounds like Dave did
some good work with this to get it to read long file names.)  Unfortunately
for a few of my Word documents it produced only a string of question marks
- when run from the command line. Or sometimes some text, and then a string
of question marks. When called while indexing, it seemed to cause swish-e
to hang.. (on one of these files).

I downloaded V.93.3 of catdoc from
http://www.45.free.net/~vitus/ice/catdoc/

This seemed to work better (except, it couldn't handle long filenames). And
it couldn't handle 10 of my files - giving a "Bad BBD entry!" error and
freezing (in 9 out of 10 cases).  The files it didn't work on were large
files (20MB) with lots of jpg included (the staff newsletter!).

I guess I just battle on.(I am getting around the long filenames by copying
the file somewhere else first, and I  have a list of files, now, that I
will ignore...)  Any suggestions appreciated..  (eg a way to trap errors
from catdoc).

Allan.























This email message and any accompanying attachments may contain
information that is confidential and is subject to legal privilege. If you are not
the intended recipient, do not read, use, disseminate, distribute or copy this 
message or attachments. If you have received this message in error, please 
notify the sender immediately and delete this message. Any views expressed
in this message are those of the individual sender, except where the sender
expressly, and with authority, states them to be the views of AMP. Before 
opening any attachments, please check them for viruses and defects.
Received on Thu Jan 22 21:00:02 2004