Skip to main content.
home | support | download

Back to List Archive

Re: Swish-e PDF titles in search results

From: Luke Simmons <lukes(at)not-real.deeson.co.uk>
Date: Thu Jul 13 2006 - 14:35:28 GMT
Hi Bill,

Thanks for your lightning response, much appreciated!

Ok I removed the line, re-indexed, ran a search via the cgi and the  
web page excerpt was as follows -

%PDF-1.5 % 1 0 obj> endobj 2 0 obj>stream 4294967295 2126 3183  
Acrobat Distiller 6.0 for Macintosh 1 300/1 300/1 2  
2006-01-09T12:01:59Z 2006-01-09T12:01:59Z 2006-01-09T12:01:59Z Adobe  
Photoshop CS Macintosh uuid:e40b4bbf-8296-11da ...

This is the stored description? I want to guess at what's happened  
but can only think that what's being indexed is the meta data for the  
pdfs in html? Is this correct?

On use of -

[root (at) tiger archive]# /usr/local/lib/swish-e/DirTree.pl  
edjanfeb06.pdf | swish-e -S prog -i stdin -c ../../cgi-bin/archswish/ 
swish.conf -v0 -T properties

I'm receiving -

        swishdocpath: 6 ( 16) S: "./edjanfeb06.pdf"
        swishtitle: 7 ( 10) S: "Jan Feb 06"
        swishdocsize: 8 (  4) N: "140528"
        swishlastmodified: 9 (  4) D: "2006-07-13 10:51:56 BST"
        swishdescription:11 (138904) S: "Engi.....etc "

I ran a

[root (at) tiger archive]# /usr/local/lib/swish-e/DirTree.pl  
edjanfeb06.pdf | grep title

and got

<meta name="title" content="Jan Feb 06">

and no <title>Jan Feb 06</title>

Do I need to add the title method in SWISH::Filters::Pdf2HTML.pm?
Is there something I should be adding to the swish config file to use  
with the converted pdf data (i.e. the HTML)?


Thanks

Luke
Received on Thu Jul 13 07:35:29 2006