Skip to main content.
home | support | download

Back to List Archive

Re: Special Characters

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Thu May 06 2004 - 03:19:49 GMT
On Wed, May 05, 2004 at 06:01:35PM -0700, Thomas Sewell wrote:
> When someone performs a large search using some particular special
> characters where there should be results, the outcome is a big CPU
> spin for swish-e

Doesn't seem likely that non-WordCharacters would be the issue.  Does
the query finally complete if you try it from the command line?

  
> Which translates to: The President's Daughter by Jack Higgins In the
> search box on the form.

The way swish work is that ends up being seven searches all ANDed
together (by default President's  will be two search words).  So if you
have huge index it can take a while.  Yes, there's a potential DoS.
Wild card searches are even more expensive.  People tend to hit reload
when the query is slow, only making things worse.

Are you using any stopwords (IgnoreWords)?

On one site I actually run ps and count the number of swish-e binaries
running and limit that way.  Under mod_perl, for example, the number of
Apache children limit the number of requests.

> Does the special character cause some sort of much larger looping to
> occur? The index in use is fairly large, with a specification to
> search 17 different indexes with a total size of 4.5 GBs.

No, I don't think it's the special characters, but rather the number of
search words and the number and sizes of indexes.  Are you running the swish-e
binary?  There will be a bit of overhead opening all those indexes for
every request if you are running the binary.

But, if there's truly a spinning situation then running the query under
gdb and ^C while it's spinning may show where it's hanging.

> How can I fix this type of thing, or at least limit it more on the
> front-end to avoid the massive cpu spin-up?

Well, searching multiple indexes is slower than a single index because
of the extra opening of the indexes, and if sorting by anything other
than rank then sorting can be extra slow (the property file has to be
accessed for all the results to do a tape-merge type of sort).

If multiple requests from the same browser is a big problem then maybe
use sessions to limit users to one search at a time.

-- 
Bill Moseley
moseley@hank.org
Received on Wed May 5 20:19:52 2004