Re: HTTP Indexing Times for Different OSs and Swish

From: Bill Moseley <moseley(at)>
Date: Fri Jan 11 2002 - 21:03:29 GMT
At 12:12 PM 01/11/02 -0800, Deane Barker wrote:
>Machine A:  Athlon 750 Mhz, 224MB RAM running Mandrake Linux 8.1  ("SWISH-E
>Machine B:  Athlon 1 GHz, 384MB RAM running Windows XP Home  ("SWISH-E

Those are not the same version of swish.  2.0.x is MUCH slower than

Also, indexing for three seconds is not a very good test, either.  

If you have the same 30,000 files on Windows and on Linux then it might be
easier to compare.  I'm sure you won't see much difference between linux
and windows with the same hardware.

Yesterday I tried indexing 24,000 files in my /usr/doc with 2.0.5,
2.1-dev-20 (basically same), and current dev version.  Current version took
4 minutes.  Others never finished after over an hour (those version use a
lot more RAM so my machine was swapping).

Now, comparing spidering?  Good luck.

>Here's where it gets interesting: I set up the swishspider and unleashed
>them both on the same web site (very small -- just 19 unique pages) via HTTP
>crawl at the same general time (one just after another, late at night when
>volume was low; web server logs indicate that the spider was the only active
>session on the web site at the time).
>The time differences were massive: 
>Machine A (Linux / Swish 2.0):  21 minutes  (that's MINUTES, not seconds...)
>Machine B (Windows / Swish 2.1-dev-24):  14 seconds

I think you forgot to set the delay to zero.  By default, using -S http,
swish waits one minute between requests.

Bill Moseley
