Skip to main content.
home | support | download

Back to List Archive

Re: Incremental indexing?

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Sat Feb 23 2002 - 06:55:59 GMT
At 07:14 PM 02/22/02 -0800, Keith Thompson wrote:
>
>>I'd stay away from merge.
>
>Can you give me some specifics?  Is it just plain broken,
>or does it have specific known problems?

Under normal indexing, memory optimizations are made after every file is
indexed.  But when merging indexing is not done file-by-file, so the
optimizations can not be done.

It may be that merge.c can be updated to better compress data while
merging.  I do remember trying to use some of the compression routines when
rewriting merge.c.  Jose would probably need to review the code to see if
anything could be done to improve it.  I get lost trying to follow the
compression and optimization code that's used.

>>If you were to store the documents compressed in something like MySQL as
>>they come in then it makes managing a system like that a bit easier since
>>you can timestamp things and not have to worry about duplicate files.  And
>>then you always have copies of the original documents.
>
>I've considered this, but we're talking about enough potential data that
>storage of the files is prohibitive.  Plus, in some cases the content
>of the files are such that I'm not supposed to be keeping them for
>security reasons.  So, as much as such a thing like this would help,
>I [unfortunately] don't want to consider it unless necesary.

If there's so much data that storage is a problem, then the real issue will
be RAM for indexing.

-- 
Bill Moseley
mailto:moseley@hank.org
Received on Sat Feb 23 06:56:30 2002