Skip to main content.
home | support | download

Back to List Archive

Re: Storing Descriptions for PDF Files

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Fri Sep 27 2002 - 18:59:26 GMT
At 11:46 AM 09/27/02 -0700, Bill Moseley wrote:
>At 02:33 PM 09/27/02 -0400, Jeffrey.Grunstein@ny.frb.org wrote:
>>
>>We have lots of PDF files and some of them are very big.  None have a
>>metadata description set (the people
>>who created them are lazy).
>>
>>Will any of the filters parse the document and take the first n characters,
>>like what StoreDescription does?
>
>Yes.
>
>In the simple case you can do something like:
>
>    FileFilter .pdf pdftotext "'%p' -"
>    IndexContents TXT .pdf
>    StoreDescription TXT 1000

BTW -- if your PDF files are *very* large then you might try using:

 http://swish-e.org/current/docs/SWISH-CONFIG.html#item_TruncateDocSize

I haven't used that directive in a long time, so let us know if anything
blows up...

And you also might try using TXT2 instead of TXT.  TXT reads the entire doc
into memory, where TXT2 reads in chunks.  So using TruncateDocSize with
TXT2 might avoid reading more data than you need to read.


-- 
Bill Moseley
mailto:moseley@hank.org
Received on Fri Sep 27 19:02:57 2002