On Sun, Dec 21, 2003 at 06:31:47AM -0800, John Angel wrote:
> When indexing the same site, containing 600 pages, using the same settings
> for both indexers (persistent connection and md5 check):
> - swish-e indexing time: 70 minutes
> - htdig indexing time: 3 minutes
> Any ideas why's that?
Yes, I do. User error. And, a failing grade for not showing your work,
Summary for: http://localhost/doc/
Duplicates: 5,193 (324.6/sec)
Off-site links: 156 (9.8/sec)
Skipped: 2 (0.1/sec)
Total Bytes: 1,897,654 (118603.4/sec)
Total Docs: 600 (37.5/sec)
Unique URLs: 601 (37.6/sec)
Removing very common words...
no words removed.
Writing main index...
Sorting words ...
Sorting 5,967 words alphabetically
Writing header ...
Writing index entries ...
Writing word text: Complete
Writing word hash: Complete
Writing word data: Complete
5,967 unique words indexed.
4 properties sorted.
600 files indexed. 1,897,654 total bytes. 122,211 total words.
Elapsed time: 00:00:17 CPU time: 00:00:02
Still, htdig is likely faster at indexing. Thus, I would recommend that
you use htdig.
Received on Sun Dec 21 16:29:20 2003