Skip to main content.
home | support | download

Back to List Archive

Re: [swish-e] Automatic MetaNames

From: Peter Karman <peter(at)>
Date: Wed Apr 09 2008 - 02:11:11 GMT
William M Conlon wrote on 4/8/08 12:43 AM:

> I took a look at the source, and while it's straightforward to  
> capture the meta data in extprog.c, feeding these attributes into the  
> parser while it's evaluating the document requires the same work as  
> doing it in a perl callback, where it's far easier.

agreed. See e.g.

> OTOH, it seems that there are repeated inquiries on the list about  
> how to insert meta data about the document into the index.  Often we  
> know things about the document that are not included in the document  
> itself, and it seems that an extension of the existing  filtering  
> mechanism might be useful.

see URL above. That version of SWISH::Filter needs to get merged back into the 
Swish-e dist. It definitely will in 2.6; not sure if it will in 2.4.x.

> To me it would be ideal to be able to feed two streams into swish-e:
> * one stream is the [filtered] content.
> * the second stream consists of document attributes that are not  
> contained in the document itself.

yes, the current architecture requires that all data be in the 'document' so 
assigning arbitrary meta data (to be stored as MetaNames and/or Properties) 
requires insertion into the content stream. That's the 2.x paradigm.

> For now, I can take these two streams and merge them before  
> indexing.  But perhaps the distinction between information in the  
> document and information about the document could be worked into your  
> Swish3 proposal?

Your idea will be implemented in Swish3, and in fact KinoSearch (one of the two 
current backend targets) is already designed with the field/value API in mind. 
The Swish3 Perl implementation will allow for storing field/value pairs at 
indexing time, outside of the 'document' content per se.

So good idea, Bill. ;)

Peter Karman  .  .  peter(at)
Users mailing list
Received on Tue Apr 8 22:11:11 2008