Rainer Scherg RTC wrote:
>
>I've made some enhancements to swish-e 1.1 to index Non-Text or HTML files
>(e.g. to get PDF-files indexed) [I've sent the code changes to Roy].
Could you describe the code changes? Do you directly index the PDF files?
To index PDF files, I implemented the following workaround:
1. For every PDF file (for example, "myfile.pdf"), create a file
"myfile.pdf.html" that contains the plain text to be indexed.
2. When the search engine returns a hit on a myfile.pdf.html, change the
reference to myfile.pdf.
This works for other filetypes, such as Word files, etc. The only
disadvantage is that you must create the separate HTML files.
--
Patrick Fitzgerald, HP Internet and System Security Lab
http://issl.atl.hp.com/lab/employees/fitz/
fitz@issl.atl.hp.com -or- patrick_fitzgerald@hp.com
(do *not* use pat_fitzgerald@hp.com, that is not me)
Received on Mon Aug 10 10:27:21 1998