Skip to main content.
home | support | download

Back to List Archive

Re: [swish-e] pdftotext

From: Peter Karman <peter(at)not-real.peknet.com>
Date: Fri Mar 13 2009 - 02:45:12 GMT
Michelangelo Rezzonico wrote on 3/10/09 5:23 AM:
> Hi all,
> 
> I use pdftotext to index pdf-files.
> This works ok.
> The only problem is that in the output of pdftotext there are many spaces.
> 
> If in the pdf-file there is the string "2001", then in the output of
> pdftotext I find "2 0 0 1".
> 

I suspect that the formatting in your PDF is such that pdftotext is trying to
mimic it with space.

This thread seems off-topic for this list, but as there are likely lots of
pdftotext users here, try putting an example file somewhere public (attachments
to this list likely won't work) and you might find someone with a version that
does what you want.

-- 
Peter Karman  .  http://peknet.com/  .  peter(at)not-real.peknet.com
_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Thu Mar 12 22:45:08 2009