On Mon, Nov 26, 2007 at 01:46:53PM +0100, Rene.Kloos@esa.int wrote:
> test_response => sub {
> my $server = $_[1];
> $server->{no_index}++ if
> $_[0]->path =~ /\/intranet\/communities\/TechnologyObservatory\/$/;
Seems like it should work. Might check with something like:
test_response => sub {
my ( $path, $server ) = @_;
return 1 unless
$path =~ m!^/intranet/communities/TechnologyObservatory$!;
$server->{no_index}++;
warn "Found path to not index [$path]\n";
return 1;
},
And spider.pl is just Perl, so maybe:
# Extract out links (if not too deep)
my $links_extracted = extract_links( $server, \$content, $response )
unless defined $server->{max_depth} && $depth >= $server->{max_depth};
## ADD SOMETHING LIKE ##
if ( $server->{no_index} ) {
warn "not sending content for '$uri'\n";
use Data::Dumper;
warn "But found links: " . Dumper( $links_extracted );
}
--
Bill Moseley
moseley@hank.org
Unsubscribe from or help with the swish-e list:
http://swish-e.org/Discussion/
Help with Swish-e:
http://swish-e.org/current/docs
_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Mon Nov 26 20:49:07 2007