Not all PDF files are searchable. It will depend on how the PDF was
created.
There are 3 types of PDFs: PDF Normal, PDF Searchable Image and PDF
Image
Only. See http://www.dclab.com/pdf_conversion.asp for more information
about
these different types.
James H. Cutts III
CORI - 143C Mumford
-----Original Message-----
From: swish-e@sunsite3.berkeley.edu
[mailto:swish-e@sunsite3.berkeley.edu] On Behalf Of David Larkin
Sent: Thursday, December 08, 2005 4:43 PM
To: Multiple recipients of list
Subject: [SWISH-E] Re: Indexing PDF files - reliable ?
On Thu, 8 Dec 2005 13:42:31 -0800 (PST)
Bill Moseley <moseley@hank.org> wrote:
> On Thu, Dec 08, 2005 at 01:05:59PM -0800, David Larkin wrote:
> > Is it due to PDF version number ?
>
> Swish uses pdftotext. Run that on the docs and see what comes out.
>
79:k{david}% grep the Samba-Developers-Guide.txt | wc -l
206
80:{david}% grep the spm.txt | wc -l
437
81:{david}% grep the isj2001-final.txt | wc -l
0
82:{david}%
isj2001-final.txt looks very strange , i wonder if original pdf came
from a scanner or some such thing
> --
> Bill Moseley
> moseley@hank.org
>
> Unsubscribe from or help with the swish-e list:
> http://swish-e.org/Discussion/
>
> Help with Swish-e:
> http://swish-e.org/current/docs
> swish-e@sunsite.berkeley.edu
>
Received on Thu Dec 8 14:50:09 2005