Skip to main content.
home | support | download

Back to List Archive

RE: new fuction

From: Василевский Сергей <allby(at)not-real.open.by>
Date: Wed Jul 06 2005 - 07:33:18 GMT
May be you make pred cache parsing words in quantity of min_words_in_file.
In this case no need redesign algoritm. 
Simply skip index pred cache words if cache not full and 
current indexed words count for file less then min_words_in_file.

> -----Original Message-----
> From: Bill Moseley [mailto:moseley@hank.org] 
> Sent: Tuesday, July 05, 2005 9:05 PM
> To: ??????????? ??????
> Cc: Multiple recipients of list
> Subject: Re: new fuction
> 
> 
> On Tue, Jul 05, 2005 at 12:47:04AM -0700, ??????????? ?????? wrote:
> > I want propose add new parameter in config file:
> > min_words_in_file 1
> 
> Swish-e doesn't really know how many words are in a file until after
> they have been indexed.  So each document would either need to be
> parsed twice, or indexing redesigned to parse and store words before
> indexing, or to have a way to "un-index" all the words.
> 
> There's actually code to do the later -- it's used to reject a
> document based on its title.
> 
> Is files size not a good enough indication of "too small"?
> 
> -- 
> Bill Moseley
> moseley@hank.org
> 
> Unsubscribe from or help with the swish-e list: 
>    http://swish-e.org/Discussion/
> 
> Help with Swish-e:
>    http://swish-e.org/current/docs
>    swish-e@sunsite.berkeley.edu
> 
Received on Wed Jul 6 00:33:28 2005