Re: Default config

From: Bill Moseley <moseley(at)>
Date: Wed Dec 03 2003 - 19:03:52 GMT
On Wed, Dec 03, 2003 at 04:21:10AM -0800, John Angel wrote:
> Is it possible to have default configuration for all sites (using Prog 
> mode)?
> E.g. I want to index several sites using the same settings for agent, email, 
> keep_alive, etc. Is it necessary to repeat all that parameters for all 
> websites in the list?

I'm not quite clear.  You want to reuse the settings for different

The spider config file is loaded with a do() call in Perl, which simply
executes the commands in that file.  All that's required by that file is
to set a variable called @servers.  So, that file can do anything -- it
can read parameters from a database if you like.

If you just want to use the same settings for a bunch of servers that
get indexed at the same time you can do something like this:

my %default_config = (
    agent       => 'swish-e spider',
    email       => 'swish@domain.invalid',

    # limit to only .html files
    test_url    => sub { $_[0]->path =~ /\.html?$/ },

my @hosts_to_spider = qw(

# Now push the configs onto the @servers array

for my $cur_host ( @hosts_to_spider ) {
    my %this_host = (
        base_url => $cur_host,
        %default_config,  # copy in default parameters
    push @servers, \%this_host;

> The another problem with that is local setting of use_md5=1. How to avoid 
> duplicates from different servers if this option is not set globally?

The md5 hashes are stored globally, but the option to check them is per

Bill Moseley
Received on Wed Dec 3 19:04:16 2003