Skip to main content.
home | support | download

Back to List Archive

Re: Slow indexing speed

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Fri Jun 03 2005 - 13:47:22 GMT
On Fri, Jun 03, 2005 at 04:42:15AM -0700, Juan Salvador Castejón wrote:
> Hi,
> 
> I'm indexing a web site using spider.pl. At the beginning, indexing
> was quite fast but as the process went ahead, it was slowing down
> significantly.

Are you using -e?


> The web site is in the intranet, so access time to web pages is very
> short. The indexing process has spent three days to index 100,000
> pages and it has not finished yet.

100,000 documents should not take that long.  But filtering can slow
things way down.

Got disk space?  Break it into two steps.

   spider.pl | gzip > all_docs.gz

   gunzip all_docs.gz | swish-e -c config -S prog -i stdin -e

> Mem:   503872k av,  491040k used,   12832k free,       0k shrd,    3280k buff
>                     374028k actv,   88740k in_d,    2400k in_c
> Swap: 2096472k av, 1048440k used, 1048032k free                    5076k cached

You are out of memory.  You nee -e.


> 
>   PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME CPU COMMAND
> 28007 root         15   0  636M 397M   540        D     0,0       
> 80,7  67:48   1 swish-e
> 28008 root         16   0  327M  28M   968         S     0,0         
> 5,8  62:41   1 spider.pl

-- 
Bill Moseley
moseley@hank.org

Unsubscribe from or help with the swish-e list: 
   http://swish-e.org/Discussion/

Help with Swish-e:
   http://swish-e.org/current/docs
   swish-e@sunsite.berkeley.edu
Received on Fri Jun 3 06:47:22 2005