Skip to main content.
home | support | download

Back to List Archive

Re: incremental indexing and spidering

From: Peter Karman <peter(at)not-real.peknet.com>
Date: Sun Oct 15 2006 - 20:06:35 GMT
Brandon Shalton scribbled on 10/10/06 11:54 AM:

> ideally i would like to do:
> 
> spider.pl -c config.file http://www.somewebsite.com | swish-e -S prog -i 
> stdin
> 

that command works for me with 2.4.4:

perl spider.pl default http://swish-e.org/docs | swish-e -S prog -i stdin -W0


> where the idea is to not spider to disk the mirror copy, but to be able to 
> directly pump into swish with the incremental index, such that i could have 
> 200 of these command lines running, indexing to their indidividual 200 .idx 
> files
> 
> at the end of the day, i would merge the 200 .idx files into 1 daily index 
> file
> 
> i tried a fews ago to use the experimental incremental indexing, but i 
> couldn't get it to all work as described above.
> 

were you able to build swish-e with the --enable-incremental feature?

I'm not sure that you even need that feature, given what you're describing 
above. If you plan to have multiple spiders running and then merge, you aren't 
even using the incremental feature.

-- 
Peter Karman  .  http://peknet.com/  .  peter(at)not-real.peknet.com
Received on Sun Oct 15 13:06:50 2006