Skip to main content.
home | support | download

Back to List Archive

Re: [swish-e] Automatic MetaNames

From: Peter Karman <peter(at)>
Date: Wed Apr 09 2008 - 02:57:08 GMT
William M Conlon wrote on 4/8/08 9:36 PM:
> On Apr 8, 2008, at 7:11 PM, Peter Karman wrote:
>>> OTOH, it seems that there are repeated inquiries on the list about
>>> how to insert meta data about the document into the index.  Often we
>>> know things about the document that are not included in the document
>>> itself, and it seems that an extension of the existing  filtering
>>> mechanism might be useful.
>> see URL above. That version of SWISH::Filter needs to get merged  
>> back into the
>> Swish-e dist. It definitely will in 2.6; not sure if it will in 2.4.x.
> hmm.  I've just about finished hacking to add another user- 
> defined callback function to allow me to insert the additional  
> attributes into ALL documents, including the TEXT/HTML types that are  
> normally not filtered.

yes, doing it at the aggregator level is much better than using SWISH::Filter. I 
just referenced that feature in SWISH::Filter to show that "the existing 
filtering mechanism" already had what you were asking about.

But you want non-filtered docs (html, txt, xml) to get the metadata too. So 
hacking is better.

fwiw, SWISH::Prog makes this easy.

That should be making its way to CPAN in the next few days I hope.

> But it looks like the meta_data() method would allow me to instead  
> build a filter that inserts the attributes as meta data.  I take it  
> need to update the filters (such as pdf2html) to use set_continue, so  
> that after type conversion, my attribute_insertion filter gets called?

You could use SWISH::Filter and write a AddMetadata filter I guess. Yes, you'd 
need to set set_continue() to true to get the chaining effect for existing 
filters. If it were me, I'd be doing it in the aggregator ( e.g.) 
instead though, since then you could add the metadata just before you print() to 
-S prog.

Peter Karman  .  .  peter(at)
Users mailing list
Received on Tue Apr 8 22:57:06 2008