At 07:14 PM 02/22/02 -0800, Keith Thompson wrote:
>
>>I'd stay away from merge.
>
>Can you give me some specifics? Is it just plain broken,
>or does it have specific known problems?
Under normal indexing, memory optimizations are made after every file is
indexed. But when merging indexing is not done file-by-file, so the
optimizations can not be done.
It may be that merge.c can be updated to better compress data while
merging. I do remember trying to use some of the compression routines when
rewriting merge.c. Jose would probably need to review the code to see if
anything could be done to improve it. I get lost trying to follow the
compression and optimization code that's used.
>>If you were to store the documents compressed in something like MySQL as
>>they come in then it makes managing a system like that a bit easier since
>>you can timestamp things and not have to worry about duplicate files. And
>>then you always have copies of the original documents.
>
>I've considered this, but we're talking about enough potential data that
>storage of the files is prohibitive. Plus, in some cases the content
>of the files are such that I'm not supposed to be keeping them for
>security reasons. So, as much as such a thing like this would help,
>I [unfortunately] don't want to consider it unless necesary.
If there's so much data that storage is a problem, then the real issue will
be RAM for indexing.
--
Bill Moseley
mailto:moseley@hank.org
Received on Sat Feb 23 06:56:30 2002