Hi all
I'm having the same problem Thomas Dowling with the pdftotext creating
unwanted spaces in PDF documents. It's a crippling problem for a
database that's aiming for 100% accuracy.
The PDF native interface search works fine but the Swish-E based search
has a text that's full of words with gaps between the letters. Eg. k o
o t i instead of kooti.
Our Swish E is the latest version with pdftotext 3.02.
I've only noticed it recently. It's only a big problem with some fonts
or perhaps newer versions of FineReader and Adobe.
There is an example on our Early New Zealand books website at
http://www.enzb.auckland.ac.nz/
Click on Search >> Go to Advanced Search
http://www.enzb.auckland.ac.nz/advsearch.php?action=cs
Click on Limit by Title and click on the + sign beside 1887 - Gudgeon,
T. W. The Defenders of New Zealand
Tick the box beside [Pages 300-335]
Search For: k o o t i This phrase from the dropdown menu in Full
Text
Click on 1[pages 300-335] to view the PDF. You can copy and paste text
from the PDF with no gaps between the letters.
Try the same search for kooti or t h e or a n d
N.B. Te Kooti is a famous Maori leader and prophet who led a bitter
struggle against the colonial government in New Zealand in the 1860s -
an antipodean Geronimo. The name is a transliteration in Maori of the
missionary name Coates.
John
********************************************
John Laurie
Digital Initiatives Librarian
Digital Services
Level 3, General Library
University of Auckland
Phone (09)3737599 x 85773
Email j.laurie@auckland.ac.nz
*************************************************
_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Sun Jun 28 18:41:13 2009