On Wed, Apr 07, 2004 at 10:28:07PM -0700, Mark Greenaway wrote:
> OK I am not that familar with perl
> does anyone habe a modified copy of spider.pl or swishspider that allows
> swish-e to index off-site of external links as well as local ones.
Well, this looks like the code that checks for a matching host name:
# Here we make sure we are looking at a link pointing to the correct (or equivalent) host
unless ( $server->{scheme} eq $u->scheme && $server->{same_host_lookup}{$u->canonical->authority||''} ) {
print STDERR qq[ ?? <$tag $attribute="$u"> skipped because different host\n] if $server->{debug} & DEBUG_LINKS;
$server->{counts}{'Off-site links'}++;
validate_link( $server, $u, $base ) if $server->{validate_links};
return;
}
$u->host_port( $server->{authority} ); # Force all the same host name
so you could try removing that code from a copy of spider.pl.
Then hope max_depth works right.
--
Bill Moseley
moseley@hank.org
Received on Thu Apr 8 03:48:28 2004