I've been using xpdf's pdftotext. I know it's not a perfect conversion, but It isn't as bad as it could be (some tool do it horrible.)
After doing the conversion, I usually run a perl script to correct some frequent mistakes:
- w o r d -> spaces bettwen letters.
- strato- various -> hyphen and space for large word in different lines.
Despite this correction it is not a perfect text but it works for me.
If anyone knows a better tool please post it to the list.
>Subject: [SWISH-E] pdf to ...
> From: Peter Karman <email@example.com>
> Date: Thu, 3 Mar 2005 18:42:37 -0800 (PST)
> To: Multiple recipients of list <firstname.lastname@example.org>
>A little off-topic, but since many here deal with converting PDF to other
>formats, thought I'd start here.
>I haven't been that happy with the xpdf package. It fails to convert some PDF
>files legibly (using pdftotext), and support is a one-man show (always risky, imho).
>I have had more success with the ps2ascii tool from Ghostscript and I wonder
>what others have found.
>consider this a PDF tool survey...
>Peter Karman . http://peknet.com/ . peter(at)not-real.peknet.com
Lortu posta elektronikoa doan >> http://www.euskalerria.org
Received on Thu Mar 3 23:37:47 2005