Skip to main content.
home | support | download

Back to List Archive

Re: [swish-e] Automatic MetaNames

From: William M Conlon <bill(at)not-real.tothept.com>
Date: Tue Apr 08 2008 - 05:43:07 GMT
On Apr 5, 2008, at 8:40 PM, Peter Karman wrote:
>
>
> William M Conlon wrote on 4/4/08 5:09 PM:
>> I have a list of documents to be indexed.  In addition to the
>> document path, the list includes other attributes that should be
>> searchable, so they need to included in the index, although they may
>> not be in the document itself.
>>
>> My first thought was to use -S prog, with my external program reading
>> each document, generating HTML to feed swish-e, and inserting <meta
>> name="lanuage" content="english"> for each attribute into the <head>
>> section of the HTML.
>
> That's what I would do.
>
>
>>
>> My second thought was that swish-e needs to accept attributes that
>> are fed to the indexer with the document, perhaps in  a *NEW*
>> Attribute header, a la:
>
> Would require hacking the source. And not really a good change,  
> imo. It means
> applying parsing and tokenization at the header-parsing stage,  
> which just seems
> unnecessary, especially when the MetaName feature already supports  
> HTML or XML
> tags in the document content
>

I took a look at the source, and while it's straightforward to  
capture the meta data in extprog.c, feeding these attributes into the  
parser while it's evaluating the document requires the same work as  
doing it in a perl callback, where it's far easier.

OTOH, it seems that there are repeated inquiries on the list about  
how to insert meta data about the document into the index.  Often we  
know things about the document that are not included in the document  
itself, and it seems that an extension of the existing  filtering  
mechanism might be useful.

To me it would be ideal to be able to feed two streams into swish-e:
* one stream is the [filtered] content.
* the second stream consists of document attributes that are not  
contained in the document itself.

For now, I can take these two streams and merge them before  
indexing.  But perhaps the distinction between information in the  
document and information about the document could be worked into your  
Swish3 proposal?

thx.

--bill


>> And my last thought was to overload the Path-Name with the attributes
>> and use ExtractPath to build metanames.
>>
>
> that's do-able too. But I would still use <meta> tags myself.
>
>
_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Tue Apr 8 01:43:10 2008