Skip to main content.
home | support | download

Back to List Archive

Re: Displaying a filtered PDF's title in <swishtitle>

From: Bill Moseley <moseley(at)>
Date: Tue Oct 05 2004 - 23:53:52 GMT
On Tue, Oct 05, 2004 at 04:44:12PM -0700, Tim Hartley wrote:
> Hi all,
> I'm using the File System index method to create an index of about
> 450 pdf files. I can successfully do this, however it returns
> 'pdf_file_name.pdf' as the title, and I need it to display the
> actual pdf's title.

You use pdfinfo to extract out the title from the pdf.

> This is sort of discussed at these archives
> ( and
>, but I'm assuming I can't call
> a 'filter_content' using the File System method, and I don't know
> how to tweak the PDF2HTML filter file to do what I need..

Why not use and let it use SWISH::Filter to handle this?
It will be like faster than using the file system and calling the perl
script for every document.

I suppose you could use swish-filter-test to do the work for you (but
it would be very slow).

   FilerFilter swish-filter-test "-content -quiet '%p'" .pdf

I'd recommend using, though.

> ---Begin Config File (pdf_file_test.config)---
> #Name & location of the index file created by this search configuration
> IndexFile c:\swish-e\pdfTestIndex.index
> IndexDir C:\Inetpub\wwwroot\\planetpdf\pdfs
> IndexOnly .pdf
> #Dont index anything other than the PDF directory
> FileMatch pathname contains pdfs
> IndexReport 3
> FilterDir C:/SWISH-E/lib/swish-e/perl/SWISH/Filters
> FileFilter ./pdf2html "'%p' -" /\\.pdf$/
> IndexContents HTML* .pdf .PDF
> StoreDescription HTML* <description> 200000
> PropertyNameAlias swishdescription title
> #Replace the pathname with a url
> ReplaceRules Replace "C:/Inetpub/wwwroot/" ""
> #run on cmd line: swish-e -S fs -c pdf_file_test.config
> ---End Config file--- 
> Any suggestions would be greatly appreciated!
> -Tim

Bill Moseley

Unsubscribe from or help with the swish-e list:

Help with Swish-e:
Received on Tue Oct 5 16:54:04 2004