Skip to main content.
home | support | download

Back to List Archive

incremental indexing and spidering

From: Brandon Shalton <brandon(at)not-real.cydataservices.com>
Date: Tue Oct 10 2006 - 16:55:33 GMT
Greetings,

I am long time user of swish, and greatly encouraged by the incremental 
indexing option.

I do alot of spidering to map how websites link to each  other (database is 
over 1B records), and i want to keyword index as well.

ideally i would like to do:

spider.pl -c config.file http://www.somewebsite.com | swish-e -S prog -i 
stdin

where the idea is to not spider to disk the mirror copy, but to be able to 
directly pump into swish with the incremental index, such that i could have 
200 of these command lines running, indexing to their indidividual 200 .idx 
files

at the end of the day, i would merge the 200 .idx files into 1 daily index 
file

i tried a fews ago to use the experimental incremental indexing, but i 
couldn't get it to all work as described above.

any pointers would be greatly appreciated.

-brandon
Received on Tue Oct 10 09:56:38 2006