Skip to main content.
home | support | download

Back to List Archive

Re: PDF indexing

From: <moseley(at)not-real.hank.org>
Date: Tue Sep 16 2003 - 15:27:21 GMT
On Tue, Sep 16, 2003 at 01:21:52AM -0700, redna@euskalerria.org wrote:
> Hi all:
> 
> Is there a way to attach meta tags when indexing pdf files?
> or must I convert it first to HTML?

No, not directly.

You mean other than adding meta data to the pdf file?

If you are using spider.pl to index you can add a filter and can modify 
the content any way you like.  If you look at SwishSpiderConfig.pl you 
can see some examples.

Yes, if you filter first to HTML then you can add any meta tags you 
like.  You can do that either using HTML::TreeBuilder or HTML::Parser, 
or simply doing a regular expression substitution.

I have often thought that a way to add meta data to the headers when 
using -S prog might be nice.


-- 
Bill Moseley
moseley@hank.org
Received on Tue Sep 16 15:27:31 2003