Skip to main content.
home | support | download

Back to List Archive

Re: IgnoreLimit (was Re: Q: Segmentation Fault?)

From: <jmruiz(at)not-real.boe.es>
Date: Mon Feb 19 2001 - 16:53:03 GMT
Hi,

On 19 Feb 2001, at 8:02, Bill Moseley wrote:

> And after about 15 minutes I killed it.  Jose, would it be possible to
> make all the adjustments in one pass or must they be made one word at
> a time?
> 

I rewrote this routine (removestops) in the first days of 2.0 just to 
complain with phrase search. The problem is really hard. When you 
remove a word from the list of words you have also to adjust the 
position counter of all the rest of words, in each occurence, when the 
automatic stopword was before it. Perhaps there can be a faster 
approach...

Eg:
With just one file with the following test:
This is a word in a phrase in this file

More or less the info is like this:
this: file 1 positions 1,9
is: file 1 position 2
in: file 1 positions 5,8

After removing "a"

this: file 1 positions 1,7
is: file 1 position 2
in: file 1 positions 4,6

An "automatic" stopword like "a" is in almost all the files several 
times. So adjusting positions is a heavy CPU/RAM proccess. Also, 
some of the info is compressed to save RAM and need to be 
decompressed/compressed in the fly to recompute the positions.

cu
Jose
Received on Mon Feb 19 16:57:35 2001