Skip to main content.
home | support | download

Back to List Archive

Re: Default config

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Wed Dec 03 2003 - 19:03:52 GMT
On Wed, Dec 03, 2003 at 04:21:10AM -0800, John Angel wrote:
> Is it possible to have default configuration for all sites (using Prog 
> mode)?
> 
> E.g. I want to index several sites using the same settings for agent, email, 
> keep_alive, etc. Is it necessary to repeat all that parameters for all 
> websites in the list?

I'm not quite clear.  You want to reuse the settings for different
sites?

The spider config file is loaded with a do() call in Perl, which simply
executes the commands in that file.  All that's required by that file is
to set a variable called @servers.  So, that file can do anything -- it
can read parameters from a database if you like.

If you just want to use the same settings for a bunch of servers that
get indexed at the same time you can do something like this:

my %default_config = (
    agent       => 'swish-e spider http://swish-e.org/',
    email       => 'swish@domain.invalid',

    # limit to only .html files
    test_url    => sub { $_[0]->path =~ /\.html?$/ },
);

my @hosts_to_spider = qw(
    http://first.host/index.html
    http://second.host/index.html
    http://third.host/index.html
);

# Now push the configs onto the @servers array

for my $cur_host ( @hosts_to_spider ) {
    my %this_host = (
        base_url => $cur_host,
        %default_config,  # copy in default parameters
    );
    push @servers, \%this_host;
);


> The another problem with that is local setting of use_md5=1. How to avoid 
> duplicates from different servers if this option is not set globally?

The md5 hashes are stored globally, but the option to check them is per
server.


-- 
Bill Moseley
moseley@hank.org
Received on Wed Dec 3 19:04:16 2003