Skip to main content.
home | support | download

Back to List Archive

Re: Swish-e PDF titles in search results

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Thu Jul 13 2006 - 13:46:12 GMT
On Thu, Jul 13, 2006 at 06:40:02AM -0700, Luke Simmons wrote:
> [root (at) tiger archive]# /usr/local/lib/swish-e/DirTree.pl  
> edjanfeb06.pdf | head -30
> 
> <head>
> <meta name="author" content="A person">
> <meta name="creationdate" content="Tue Jan  3 11:10:41 2006">
> <meta name="creator" content="QuarkXPress: pictwpstops filter 1.0">
> <meta name="encrypted" content="no">
> <meta name="file_size" content="2711063 bytes">
> <meta name="moddate" content="Thu Jul 13 10:42:32 2006">
> <meta name="optimized" content="no">
> <meta name="page_size" content="595 x 842 pts (A4)">
> <meta name="pages" content="36">
> <meta name="pdf_version" content="1.5">
> <meta name="producer" content="Acrobat Distiller 6.0.1 for Macintosh">
> <meta name="tagged" content="no">
> <meta name="title" content="Jan Feb 06">

So it's clear that DirTree.pl is converting the PDF to html, right?

Yet when running indexing you get this, which indicates a broken PDF:

> Error: May not be a PDF file (continuing anyway)
> Error (0): PDF file is damaged - attempting to reconstruct xref table...
> Error: Couldn't find trailer dictionary
> Error: Couldn't read xref table

And the reason for that is you are taking the converted PDF file
(which is now HTML) and telling swish to run it though pdftotext:


> 	FileFilter .pdf /usr/local/bin/pdftotext   "'%p' -"

Get rid of that line.

-- 
Bill Moseley
moseley@hank.org

Unsubscribe from or help with the swish-e list: 
   http://swish-e.org/Discussion/

Help with Swish-e:
   http://swish-e.org/current/docs
   swish-e@sunsite.berkeley.edu
Received on Thu Jul 13 06:46:12 2006