On Fri, Jun 03, 2005 at 04:42:15AM -0700, Juan Salvador Castejón wrote:
> Hi,
>
> I'm indexing a web site using spider.pl. At the beginning, indexing
> was quite fast but as the process went ahead, it was slowing down
> significantly.
Are you using -e?
> The web site is in the intranet, so access time to web pages is very
> short. The indexing process has spent three days to index 100,000
> pages and it has not finished yet.
100,000 documents should not take that long. But filtering can slow
things way down.
Got disk space? Break it into two steps.
spider.pl | gzip > all_docs.gz
gunzip all_docs.gz | swish-e -c config -S prog -i stdin -e
> Mem: 503872k av, 491040k used, 12832k free, 0k shrd, 3280k buff
> 374028k actv, 88740k in_d, 2400k in_c
> Swap: 2096472k av, 1048440k used, 1048032k free 5076k cached
You are out of memory. You nee -e.
>
> PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND
> 28007 root 15 0 636M 397M 540 D 0,0
> 80,7 67:48 1 swish-e
> 28008 root 16 0 327M 28M 968 S 0,0
> 5,8 62:41 1 spider.pl
--
Bill Moseley
moseley@hank.org
Unsubscribe from or help with the swish-e list:
http://swish-e.org/Discussion/
Help with Swish-e:
http://swish-e.org/current/docs
swish-e@sunsite.berkeley.edu
Received on Fri Jun 3 06:47:22 2005