Skip to main content.
home | support | download

Back to List Archive

Re: pdftotext - erroring out

From: intervolved none <intervolved(at)not-real.yahoo.com>
Date: Thu Oct 24 2002 - 14:34:28 GMT
Thanks Bill for the response.
It is all PDF's that it runs against.  I have downloaded PDF's from the web, tried to index them and all of them fail.  
I have run the program pdftotext.exe at the command line and it converts the files fine (I have not brought it up in a hex editor to look for unprintables...) .  What I mean by fine is that I see that text that was in the PDF file and there are no noticible problems.
 
 Bill Moseley <moseley@hank.org> wrote:On Thu, 24 Oct 2002, intervolved none wrote:

> 
> (I hope that this is not double posted. I sent one email before being "signed up" and have not found my question in the archives.)
> 
> I am trying to index pdf files. I get the following error messages : 
> 
> Error (0): PDF file is damaged - attempting to reconstruct xref table...
> 
> Error (202734): Unknown compression method in flate stream

It means your PDF file is damaged. You can try running with -v3 and see
which file is damaged.

A few days ago I modified pdftoinfo and pdftotext (error.cc IIRC) to abort
on errors and then modified the spider to print out the pdf file name when
it fails to convert. Future version of xpdf will print out the file name
on error, I've been told.

-- 
Bill Moseley moseley@hank.org



---------------------------------
Do you Yahoo!?
Y! Web Hosting - Let the expert host your web site


*********************************************************************
Due to deletion of content types excluded from this list by policy,
this multipart message was reduced to a single part, and from there
to a plain text message.
*********************************************************************
Received on Thu Oct 24 14:38:17 2002