Skip to main content.
home | support | download

Back to List Archive

Re: 24 hours of spidering and still going

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Fri Aug 17 2001 - 18:49:50 GMT
At 10:57 AM 08/17/01 -0700, Don Fike wrote:
>I used the simplest example given for the use of prog spidering yesterday
>about this time.

First, read 
http://sunsite.berkeley.edu/SWISH-E/2.2/docs/INSTALL.html#QUESTIONS_AND_TROUBLESHOOTING

That will help us know what you are doing.

Are you spidering with -S prog (spider.pl) or -S http?  You say "prog" above, but I'm not sure if that's what you mean.

>It's still going and this file, prog_icl_external.index.prop.temp, has just
>reached 50 meg.

All depends on what you are puttting in the .prop file.  If you are using StoreDescription then, sure, that's easy and 50M would not be considered very much.  (The plan is to compress that data to save disk and i/o).

>The site that it was to spider has about 4000 files, would you say this
>behavior is normal?

No.  4000 / 24 hours = ~2.8 per minute.  I would hope your web server can return documents faster than that.  Do you have a delay set in the spider?  

>Is it September 9th yet?

Which year?



Bill Moseley
mailto:moseley@hank.org
Received on Fri Aug 17 19:15:25 2001