Skip to main content.
home | support | download

Back to List Archive

Re: performance comparison

From: <jmruiz(at)not-real.boe.es>
Date: Wed Aug 30 2000 - 17:47:58 GMT
Hi Bas,

Yo have found your problem!! 
Let me explain ...

Swish-e 2.0.1 needs more memory than swish-e-1.3.X.
Why? It is simple. It needs to store all word's positions while
indexing in order to get phrase search operative.

Thus, I am also sure the answer to the slow performance is:
your linux box is paging!!!
When the message "Writing index entries ..." appears, the index 
proccess is almost finished, it is writing all the words and positions 
to disk. These are assumed to be in memory but in your case, if 
your box is paging, it can take a very long time as you are noticing.

BTW, this big amount of memory is only needed for index, not for
searching.

Try another test with less docs to compare performance. 2.x 
should be much more faster. 

cu
Jose

On 30 Aug 2000, at 5:51, Bas Meijer wrote:

> Hi,
> 
> 
> I was just curious what the difference was in performance between
> swish-e 2.0.1 (formerly the PHRASE codetree) and the patched swish-e 1.3.3
> distributed with my Lookup package (http://bas.antraciet.nl/lookup/).
> 
> Swish-E stands for Simple Web Indexing System for Humans - Enhanced. 
> It was originally developed by Keven Hughes then at Enterprise 
> Integration Technologies.
> 
> Swish-E is available under GNU GPL from http://sunsite.berkeley.edu/SWISH-E/
> the newly developed 2.0.1 is available from http://www.boe.es/swish-e/.
> 
> The test system is a Toshiba 4000 CDT laptop with 96MB ram and a 
> pentium II 233 Mhz.
> 
> I ran swish-e on my Linux Documentation so we can search in it:
> /usr/local/bin/swish-e -c linux.conf
> 
> 'Linux' contains 106869 words, 7113 files
> the linux.swish index file was being written to from around 16:15:
> -rw-r--r--    1 root     root       678830 Aug 29 16:19 linux.swish
> First I thought there was a bug, because it took so long and there 
> was no output to the screen. However the process continued until 
> 3:22, when it wrote the index file:
> -rw-r--r--    1 root     root     23782394 Aug 30 03:22 linux.swish
> Swish-e 2.0.1 is a lengthy process, some 11 hours.
> 
> I ran my swish-e 1.3.3 with the same configuration with a total running time of
> 101 minutes, 9 seconds, resulting in
> -rw-r--r--    1 root     root     16953088 Aug 30 12:47 linux.swish
> that is 6.5 times as fast.
> 
> 
> # This SWISH-E configuration file was used
> IndexFile linux.swish
> IndexName Linux
> IndexDescription This
> IndexAdmin webmaster@antraciet.com
> IndexDir /usr/doc
> IndexOnly .html .htm .txt
> #NoContents .gif .xbm .au .mov .mpg .jpeg .pdf .zip .doc .xls
> #FileRules filename contains .inc frameset menu
> #FileRules directory contains .htaccess
> # FileRules
> FollowSymLinks no
> # MetaNames
> # PropertyNames
> ReplaceRules replace /usr/doc /docs
> # MinWordLimit
> # MaxWordLimit
> # WordCharacters
> # BeginCharacters
> # EndCharacters
> # IgnoreLastChar
> # IgnoreFirstChar
> # IgnoreLimit 80 250
> # IgnoreWords SwishDefault
> IndexReport 3
> 
> I compiled the original swish, but it segfaulted.
> 
> For a change i also compiled Swish++ (http://www.best.com/~pjl/software/swish/)
> another flavor and total rewrite of swish. Although it has different features
> I was charmed with the indexing speed: 5 minutes 11 seconds!! and it's filesize
> 
> -rw-r--r--    1 root     root      7801905 Aug 30 14:31 swish++.index
> 
> Swish++ clearly has a much faster indexing process, some of it 
> explained by lacking the phrase stuff, but a lot explained by it's 
> author's comments: using the mmap call and some nifty algorithms.
> 
> I am very much interested in knowing how to limit the resource 
> consumption of swish, could anyone point me to configuration changes 
> that speed up the indexing?
> 
> 
> 
> Bas Meijer
> 
> -- 
> 
> --  /'''     Bas Meijer mailto:bas@antraciet.com
>      c-OO     http://antraciet.com Web Services
>      \  >     Kerkstraat 19 Postbus 256 1400 AG Bussum
>       \&&     t. +31 35 7502100  f. +31 35 7502111
> 
Received on Wed Aug 30 17:52:11 2000