Skip to main content.
home | support | download

Back to List Archive

Re: Stats

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Sun Dec 21 2003 - 16:29:05 GMT
On Sun, Dec 21, 2003 at 06:31:47AM -0800, John Angel wrote:
> When indexing the same site, containing 600 pages, using the same settings
> for both indexers (persistent connection and md5 check):
> 
> - swish-e indexing time: 70 minutes
> - htdig indexing time: 3 minutes
> 
> Any ideas why's that?

Yes, I do.  User error.  And, a failing grade for not showing your work, 
again.


Summary for: http://localhost/doc/
    Duplicates:     5,193  (324.6/sec)
Off-site links:       156  (9.8/sec)
       Skipped:         2  (0.1/sec)
   Total Bytes: 1,897,654  (118603.4/sec)
    Total Docs:       600  (37.5/sec)
   Unique URLs:       601  (37.6/sec)
Removing very common words...
no words removed.
Writing main index...
Sorting words ...
Sorting 5,967 words alphabetically
Writing header ...
Writing index entries ...
  Writing word text: Complete
  Writing word hash: Complete
  Writing word data: Complete
5,967 unique words indexed.
4 properties sorted.                                              
600 files indexed.  1,897,654 total bytes.  122,211 total words.
Elapsed time: 00:00:17 CPU time: 00:00:02
Indexing done!

Still, htdig is likely faster at indexing.  Thus, I would recommend that
you use htdig.



-- 
Bill Moseley
moseley@hank.org
Received on Sun Dec 21 16:29:20 2003