Hi Bill,
Thanks for your lightning response, much appreciated!
Ok I removed the line, re-indexed, ran a search via the cgi and the
web page excerpt was as follows -
%PDF-1.5 %âãÏÓ 1 0 obj> endobj 2 0 obj>stream 4294967295 2126 3183
Acrobat Distiller 6.0 for Macintosh 1 300/1 300/1 2
2006-01-09T12:01:59Z 2006-01-09T12:01:59Z 2006-01-09T12:01:59Z Adobe
Photoshop CS Macintosh uuid:e40b4bbf-8296-11da ...
This is the stored description? I want to guess at what's happened
but can only think that what's being indexed is the meta data for the
pdfs in html? Is this correct?
On use of -
[root (at) tiger archive]# /usr/local/lib/swish-e/DirTree.pl
edjanfeb06.pdf | swish-e -S prog -i stdin -c ../../cgi-bin/archswish/
swish.conf -v0 -T properties
I'm receiving -
swishdocpath: 6 ( 16) S: "./edjanfeb06.pdf"
swishtitle: 7 ( 10) S: "Jan Feb 06"
swishdocsize: 8 ( 4) N: "140528"
swishlastmodified: 9 ( 4) D: "2006-07-13 10:51:56 BST"
swishdescription:11 (138904) S: "Engi.....etc "
I ran a
[root (at) tiger archive]# /usr/local/lib/swish-e/DirTree.pl
edjanfeb06.pdf | grep title
and got
<meta name="title" content="Jan Feb 06">
and no <title>Jan Feb 06</title>
Do I need to add the title method in SWISH::Filters::Pdf2HTML.pm?
Is there something I should be adding to the swish config file to use
with the converted pdf data (i.e. the HTML)?
Thanks
Luke
Received on Thu Jul 13 07:35:29 2006