Skip to main content.
home | support | download

Back to List Archive

performance comparison

From: Bas Meijer <bas(at)not-real.antraciet.nl>
Date: Wed Aug 30 2000 - 12:51:00 GMT
Hi,


I was just curious what the difference was in performance between
swish-e 2.0.1 (formerly the PHRASE codetree) and the patched swish-e 1.3.3
distributed with my Lookup package (http://bas.antraciet.nl/lookup/).

Swish-E stands for Simple Web Indexing System for Humans - Enhanced. 
It was originally developed by Keven Hughes then at Enterprise 
Integration Technologies.

Swish-E is available under GNU GPL from http://sunsite.berkeley.edu/SWISH-E/
the newly developed 2.0.1 is available from http://www.boe.es/swish-e/.

The test system is a Toshiba 4000 CDT laptop with 96MB ram and a 
pentium II 233 Mhz.

I ran swish-e on my Linux Documentation so we can search in it:
/usr/local/bin/swish-e -c linux.conf

'Linux' contains 106869 words, 7113 files
the linux.swish index file was being written to from around 16:15:
-rw-r--r--    1 root     root       678830 Aug 29 16:19 linux.swish
First I thought there was a bug, because it took so long and there 
was no output to the screen. However the process continued until 
3:22, when it wrote the index file:
-rw-r--r--    1 root     root     23782394 Aug 30 03:22 linux.swish
Swish-e 2.0.1 is a lengthy process, some 11 hours.

I ran my swish-e 1.3.3 with the same configuration with a total running time of
101 minutes, 9 seconds, resulting in
-rw-r--r--    1 root     root     16953088 Aug 30 12:47 linux.swish
that is 6.5 times as fast.


# This SWISH-E configuration file was used
IndexFile linux.swish
IndexName Linux
IndexDescription This
IndexAdmin webmaster@antraciet.com
IndexDir /usr/doc
IndexOnly .html .htm .txt
#NoContents .gif .xbm .au .mov .mpg .jpeg .pdf .zip .doc .xls
#FileRules filename contains .inc frameset menu
#FileRules directory contains .htaccess
# FileRules
FollowSymLinks no
# MetaNames
# PropertyNames
ReplaceRules replace /usr/doc /docs
# MinWordLimit
# MaxWordLimit
# WordCharacters
# BeginCharacters
# EndCharacters
# IgnoreLastChar
# IgnoreFirstChar
# IgnoreLimit 80 250
# IgnoreWords SwishDefault
IndexReport 3

I compiled the original swish, but it segfaulted.

For a change i also compiled Swish++ (http://www.best.com/~pjl/software/swish/)
another flavor and total rewrite of swish. Although it has different features
I was charmed with the indexing speed: 5 minutes 11 seconds!! and it's filesize

-rw-r--r--    1 root     root      7801905 Aug 30 14:31 swish++.index

Swish++ clearly has a much faster indexing process, some of it 
explained by lacking the phrase stuff, but a lot explained by it's 
author's comments: using the mmap call and some nifty algorithms.

I am very much interested in knowing how to limit the resource 
consumption of swish, could anyone point me to configuration changes 
that speed up the indexing?



Bas Meijer

-- 

--  /'''     Bas Meijer mailto:bas@antraciet.com
     c-OO     http://antraciet.com Web Services
     \  >     Kerkstraat 19 Postbus 256 1400 AG Bussum
      \&&     t. +31 35 7502100  f. +31 35 7502111
Received on Wed Aug 30 12:55:18 2000