Skip to main content.
home | support | download

Back to List Archive

RE: Document Summaries/Descriptions

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Wed Nov 15 2000 - 19:00:14 GMT
At 10:41 AM 11/15/00 -0800, jmruiz@boe.es wrote:
>Just another point of view. If the summary is stored with
>the filepath, all the file related data is contiguous in the 
>index file, making retrievals faster (less I/O may be expected).
>
>If we use properties, at least we need one extra I/O operation 
>because the data is not contiguous.

Oh, I see.  I need to review the index file format again -- if I can figure
it out...

>BTW, this makes me thinking why swish-e is using just one unique
>index file. The only reason that comes to my mind is simplicity, but...
>
>- The total index file is limited to 2GB (well, I know that probably our
>sites are not like google).
>- Updating, inserting and deleting is really hard to do. It should be 
>easier with several files. Eg: one for the header and words, another
>one for words'data, another one for file's data and another one for 
>the properties.
>
>What do you think?

I can't see it being any problem.  Frankly, I like a single file, but for
no good reason.  I was wondering about this some time back, not so much
about swish, but about Perl's DBM usage and how BerkeleyDB seemed to use
one file and ndbm (sdbm?) used .pag and .dir files.


Bill Moseley
mailto:moseley@hank.org
Received on Wed Nov 15 19:01:55 2000