Skip to main content.
home | support | download

Back to List Archive

Re: restricting swish query result sizes?

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Mon Jan 09 2006 - 20:57:28 GMT
On Mon, Jan 09, 2006 at 12:46:05PM -0800, Bill Schell wrote:
> We've got a few million documents indexed with swish here,
> distributed across multiple indices.  Queries that access
> all of the indices and that generate a lot of hits are running
> us out of physical memory (we've got 8GB too!).   For example,
> if some silly user issues a query like:   'a*' across all
> the indices, it will generate many millions of hits.  The process
> that is querying the indices via the API will grow bigger than
> available physical memory and start the machine thrashing.

Hitting the limits of swish.  Try "not dkdkdkdkdkdk"


> Swish seems to collect *all* the hits in memory so that it can rank
> them, before returning any hits at all.   If we don't care about the
> ranking, is there some way to gets hits as they occur and not incur
> the big memory storage penalty?   We'd like to halt the search when
> the number of hits exceeds some threshold.

Swish just does everything in memory.  It's collecting all the results
in memory, then sorting (which takes more memory).

You might be able to hack swish to stop at some number of results, but
I don't think it knows how many results exist until it's done
fetching.  If you search "foo AND bar" it's going to get all the
results for "foo" first, then merge in the results for "bar".  So you
can't really limit results when doing the "foo" search.

I'd limit any words with "*" to three or more chars, and see if that
helps.  Wildcard searches are going to kill you no matter what.

-- 
Bill Moseley
moseley@hank.org

Unsubscribe from or help with the swish-e list: 
   http://swish-e.org/Discussion/

Help with Swish-e:
   http://swish-e.org/current/docs
   swish-e@sunsite.berkeley.edu
Received on Mon Jan 9 12:57:28 2006