Skip to main content.
home | support | download

Back to List Archive

Re: severe swish-e problem (1.3.2)

From: Jose Manuel Ruiz <jmruiz(at)not-real.boe.es>
Date: Mon Jun 19 2000 - 14:00:26 GMT
Hi, Rainer

Rainer.Scherg@rexroth.de wrote:
> 
> Hi,
> 
> I our company a severe problem with swish-e (1.3.2-filter) occured.
> 
> What happend:
> 
>   Someone executed a search for "r*" - due to our large index
>   (16000 documents) it takes swish "a little" time to get the
>   result.
>   Due to the response/search time the user executed the search
>   request several times. But this was not the main problem.
> 
>   Swish-e uses a vast(!) amount of memory. In our cases 2GB.
>   This caused our main server (large SUN-Server, with 0.8 TBytes)
>   to be rebooted (manually), because other processes failed due
>   to a lack of swapspace/memory, etc...
> 
>   Remark: I used the swish-option  "-m 500"

There is even a worst search. Try "a* or b* or c* ..."
and you can get a good DOS atack.
Swish-e needs to keep all the results in memory. For this reason
it uses a vast amount of memory. The m option does not solve
the problem because this part is executed after all the results are
in memory. 
But there is something even worse. Swish-e-1.3.2 pre-computes 
the wildcard search into a list of "or". So "r*" becomes
"r1 or r2 or r3..." (r1, r2 and r3 are words). This makes 
swish-e-1.3.2 very slow with this type of search. Each "or" wastes
even more memory when computing intermediate results because the
memory no longer used is not freed!!
For this reason I completely rewrite the "wildcard" search
for the PHRASE version and added several calls to efree.

> 
> There may be the following solutions to this problem:
> 
>   - use a seperate machine for the search engine (e.g. a cheap
>     linux box)
> 

This does not solve the problem. Memory is cheaper but the problem
remains there.

>   - reject short query requests in the CGI script executing the
>     swish-e program.
> 

This could be the best one.

>   - restrict swish-e (via an option swish, or compile switch) to
>     a maximum of internal result enries (this differs to the
>     "-m" Option).  I know there may be implications the the
>     search results (quality, sorting, return results, etc...)
>     But this could prevent a worst case scenario...)
> 

As you say, this is hard to implement. 
 

Jose Ruiz

jmruiz@boe.es
Received on Mon Jun 19 10:04:05 2000