HTTP Indexing Times for Different OSs and Swish Versions

From: Deane Barker <deane.barker(at)>
Date: Fri Jan 11 2002 - 20:06:18 GMT
Machine B (Windows / Swish 2.1-dev-24):  14 seconds 
This is not a fluke -- I did the same test several times and got the same
The test is also informally mirrored.  I have Swish-E running at work on
Windows 2000 Professional, and a friend has it running on Mandrake Linux
8.1, both with the same version numbers (Windows at 2.1 dev, Linux at 2.0).
Performance in both instances is representative of the respective times
indicated above.
So, where does the difference come from?  It has to be something to do with
the spider since they have the same performance indexing via the file
system.  Is it:
(1)  A difference in the versions?  I know that spidering and indexing time
was improved in the new release, but improved THAT much?  Wow.
(2)  A difference in the underlying operating systems?  Could Windows and
Linux handle HTTP requests and HTML parsing THAT differently?
I researched this on the discussion group and found this post: <> 
This indicates that the system will page at the tail end of the crawl when
it says "Writing index entries...".  However, that's not the problem here.
The Linux machine is just slow from page to page when indexing.  The output
says something like, "Retrieving page http://blah.blah <http://blah.blah/>
.." and it just sits...and sits...and sits...and then moves on.
Any ideas? 
Deane Barker 
The Sling and Rock Design Group 

