Skip to main content.
home | support | download

Back to List Archive

Re:

From: Guido Adam <guido.adam(at)not-real.gmx.de>
Date: Tue Sep 03 2002 - 19:03:15 GMT
At 02.09.2002 07:54 +0200, you wrote:


>>1. encode the binary with base64 or like and decode them for display
>
>Yhat is interesting .... can you give me more details? Or give me some 
>link? When I have encoded such data, where do I have to put them? In 
>properties? And after that, How can i retrive them? Attention: I'm using 
>xml2 files and parser.

If you use the -S prog feature of swish-e, it is easy to put the binaries 
in the index.
Just write a script that encodes the binary data and embeds them in a xml 
document:

<document>
<title>a title</title>
<encoded>fhjhjhjghhgjgjgjghjg.....jghjkghjghjghjghjfghj</encoded>
<description>some useful text here</description>
..
</document>

For search you can use the perl module delivered with swish-e and decode 
the binary:

SwishSearch($handle, "a query", 1, "title encoded description, "rank desc");
my %results;
while( @results{ @standard, @props } = SwishNext( $handle )) {
         print $results{"title"};
         print decode($results{"encoded"});
}

(...or so...see docs for details)

This approach a one _big_ backdraw: the encoded data have to be stored in 
the index file (resp. in the props file) and will blow it up. And it make 
indexing slow - the system has to process all the data and produces 
properties for it.

So imho the "store the id" approach is better...


>>2. just store an id or filename which point the binary data.
>
>I have already used mySQl, but I'm wondering if it is possible to store 
>data in swish-e rether than external archive.

It is possible and might be useful if you have small binary properties.

I was technical director of one of the biggest german searchengines 'til 
end of last year (Infoseek Germany) and we had an image search, that showed 
thumbs for the images found. We found that it was better to set up an 
external data store for them, because they were too big and made indexing 
extremely slow.

Greetings

Guido


>Thank you for your kind answer.
>Best Regards
>
>Cristiano Corsani
>----------------------------------------
>Biblioteca Nazionale Centrale di Firenze
>Piazza Cavalleggeri 1
>50122 Firenze
>Tel.: +39 055 24919 220
>mailto:cristiano.corsani@bncf.firenze.sbn.it
>
Received on Tue Sep 3 19:06:47 2002