Hi,
I was just curious what the difference was in performance between
swish-e 2.0.1 (formerly the PHRASE codetree) and the patched swish-e 1.3.3
distributed with my Lookup package (http://bas.antraciet.nl/lookup/).
Swish-E stands for Simple Web Indexing System for Humans - Enhanced.
It was originally developed by Keven Hughes then at Enterprise
Integration Technologies.
Swish-E is available under GNU GPL from http://sunsite.berkeley.edu/SWISH-E/
the newly developed 2.0.1 is available from http://www.boe.es/swish-e/.
The test system is a Toshiba 4000 CDT laptop with 96MB ram and a
pentium II 233 Mhz.
I ran swish-e on my Linux Documentation so we can search in it:
/usr/local/bin/swish-e -c linux.conf
'Linux' contains 106869 words, 7113 files
the linux.swish index file was being written to from around 16:15:
-rw-r--r-- 1 root root 678830 Aug 29 16:19 linux.swish
First I thought there was a bug, because it took so long and there
was no output to the screen. However the process continued until
3:22, when it wrote the index file:
-rw-r--r-- 1 root root 23782394 Aug 30 03:22 linux.swish
Swish-e 2.0.1 is a lengthy process, some 11 hours.
I ran my swish-e 1.3.3 with the same configuration with a total running time of
101 minutes, 9 seconds, resulting in
-rw-r--r-- 1 root root 16953088 Aug 30 12:47 linux.swish
that is 6.5 times as fast.
# This SWISH-E configuration file was used
IndexFile linux.swish
IndexName Linux
IndexDescription This
IndexAdmin webmaster@antraciet.com
IndexDir /usr/doc
IndexOnly .html .htm .txt
#NoContents .gif .xbm .au .mov .mpg .jpeg .pdf .zip .doc .xls
#FileRules filename contains .inc frameset menu
#FileRules directory contains .htaccess
# FileRules
FollowSymLinks no
# MetaNames
# PropertyNames
ReplaceRules replace /usr/doc /docs
# MinWordLimit
# MaxWordLimit
# WordCharacters
# BeginCharacters
# EndCharacters
# IgnoreLastChar
# IgnoreFirstChar
# IgnoreLimit 80 250
# IgnoreWords SwishDefault
IndexReport 3
I compiled the original swish, but it segfaulted.
For a change i also compiled Swish++ (http://www.best.com/~pjl/software/swish/)
another flavor and total rewrite of swish. Although it has different features
I was charmed with the indexing speed: 5 minutes 11 seconds!! and it's filesize
-rw-r--r-- 1 root root 7801905 Aug 30 14:31 swish++.index
Swish++ clearly has a much faster indexing process, some of it
explained by lacking the phrase stuff, but a lot explained by it's
author's comments: using the mmap call and some nifty algorithms.
I am very much interested in knowing how to limit the resource
consumption of swish, could anyone point me to configuration changes
that speed up the indexing?
Bas Meijer
--
-- /''' Bas Meijer mailto:bas@antraciet.com
c-OO http://antraciet.com Web Services
\ > Kerkstraat 19 Postbus 256 1400 AG Bussum
\&& t. +31 35 7502100 f. +31 35 7502111
Received on Wed Aug 30 12:55:18 2000