"PDFinfo" and "Pdftotext -meta" work fine for the "standard" fields (author,
subject, title, keywords, etc). I have those fields indexed and searchable
for many PDF files already.
My question is regarding custom PDF properties. Those are field:value pairs
that are stored inside the PDF files. Neither pdfinfo nor pdftotext is able
to extract those.
I understand how swish-e will be able to index that information, I just
don't know how to extract it from the PDF file.
Any pointers?
-----Original Message-----
From: users-bounces@lists.swish-e.org
[mailto:users-bounces@lists.swish-e.org] On Behalf Of Peter Karman
Sent: Wednesday, March 14, 2007 12:22 PM
To: Swish-e Users Discussion List
Subject: Re: [swish-e] PDF custom properties
Bill Crawford scribbled on 3/14/07 1:11 PM:
> On Wednesday 14 Mar 2007, Eric Jobidon wrote:
>
>> I have successfully been using the PDFToText utility to extract text
>> and "standard" metadata from PDF files. The tool does not, however,
>> offer the capability to export PDF custom properties.
>>
>> Does anyone know of an open source Linux CLI tool that allows the
>> extraction of PDF custom properties?
>
> When you say "standard" do you mean the title, subject etc? There's an
> option to pdfinfo "-meta" to extract additional metadata, but I don't
> have any PDFs with such to test it.
>
yes, that's how SWISH::Filters::Pdf2HTML does it, with pdfinfo.
--
Peter Karman . http://peknet.com/ . peter(at)not-real.peknet.com
_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Wed Mar 14 14:42:33 2007