Skip to main content.
home | support | download

Back to List Archive

Re: spider.pl using a lot of RAM

From: <moseley(at)not-real.hank.org>
Date: Fri Jun 13 2003 - 05:11:07 GMT
On Thu, Jun 12, 2003 at 06:10:35PM -0700, Aaron Bazar wrote:
> Does anybody know why spider.pl would use up 200 Megs of RAM? It does not
> normally use so many resources... I am hesitant to stop it because I have
> already grabbed a lot of pages, but I am curious if anybody else has seen
> this.

Yes.  In the past there was a problem with it using a lot of memory and 
the fix was upgrading the URI module.  

I brought up the problem a number of times on the LWP list.  Here's one:

  http://www.xray.mpe.mpg.de/mailing-lists/libwww-perl/2001-07/msg00053.html

I also switched from running a recursive spider to a list-based spider 
at that time when trying to fix the problem.

On the other hand, the spider does store all accessed URLs in memory, so 
if you are indexing a vary large number of files it would use more 
memory (unlikely 200MB though).

If neither of those are the issue then we will need to look more closely 
at what you are indexing and try and find the leak.


-- 
Bill Moseley
moseley@hank.org
Received on Fri Jun 13 05:11:48 2003