Skip to main content.
home | support | download

Back to List Archive

Re: [swish-e] $server->{no_index}

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Tue Nov 27 2007 - 01:49:09 GMT
On Mon, Nov 26, 2007 at 01:46:53PM +0100, Rene.Kloos@esa.int wrote:
> test_response           =>      sub {
>                                           my $server = $_[1];
>                                           $server->{no_index}++ if
> $_[0]->path =~ /\/intranet\/communities\/TechnologyObservatory\/$/;

Seems like it should work.  Might check with something like:

    test_response => sub {
        my ( $path, $server ) = @_;

        return 1 unless
            $path =~ m!^/intranet/communities/TechnologyObservatory$!;

        $server->{no_index}++;
        warn "Found path to not index [$path]\n";
        return 1;
    },

And spider.pl is just Perl, so maybe:

    # Extract out links (if not too deep)

    my $links_extracted = extract_links( $server, \$content, $response )
        unless defined $server->{max_depth} && $depth >= $server->{max_depth};

    ## ADD SOMETHING LIKE ##
    if ( $server->{no_index} ) {
        warn "not sending content for '$uri'\n";
        use Data::Dumper; 
        warn "But found links: " . Dumper( $links_extracted );
    }

-- 
Bill Moseley
moseley@hank.org

Unsubscribe from or help with the swish-e list: 
   http://swish-e.org/Discussion/

Help with Swish-e:
   http://swish-e.org/current/docs

_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Mon Nov 26 20:49:07 2007