I did not index localhost.
What do you mean by user error?
I am using spider.config
keep_alive => 1
delay_secs => 0
use_md5 => 1
Tried both HTML and HTML* parser type, it is the same speed.
Is there anything else I can do to speed up things?
----- Original Message -----
From: "Bill Moseley" <firstname.lastname@example.org>
To: "John Angel" <email@example.com>
Cc: "Multiple recipients of list" <firstname.lastname@example.org>
Sent: Sunday, December 21, 2003 17:26
Subject: Re: [SWISH-E] Stats
> On Sun, Dec 21, 2003 at 06:31:47AM -0800, John Angel wrote:
> > When indexing the same site, containing 600 pages, using the same
> > for both indexers (persistent connection and md5 check):
> > - swish-e indexing time: 70 minutes
> > - htdig indexing time: 3 minutes
> > Any ideas why's that?
> Yes, I do. User error. And, a failing grade for not showing your work,
> Summary for: http://localhost/doc/
> Duplicates: 5,193 (324.6/sec)
> Off-site links: 156 (9.8/sec)
> Skipped: 2 (0.1/sec)
> Total Bytes: 1,897,654 (118603.4/sec)
> Total Docs: 600 (37.5/sec)
> Unique URLs: 601 (37.6/sec)
> Removing very common words...
> no words removed.
> Writing main index...
> Sorting words ...
> Sorting 5,967 words alphabetically
> Writing header ...
> Writing index entries ...
> Writing word text: Complete
> Writing word hash: Complete
> Writing word data: Complete
> 5,967 unique words indexed.
> 4 properties sorted.
> 600 files indexed. 1,897,654 total bytes. 122,211 total words.
> Elapsed time: 00:00:17 CPU time: 00:00:02
> Indexing done!
> Still, htdig is likely faster at indexing. Thus, I would recommend that
> you use htdig.
> Bill Moseley
Received on Sun Dec 21 17:27:33 2003