To get swish to index external links, modify spider.pl (v 2.4.2) as follows
881 # Here we make sure we are looking at a link pointing to the correct
(or equivalent) host
882
883 # unless ( $server->{scheme} eq $u->scheme &&
$server->{same_host_lookup}{$u->canonical->authority||''} ) {
884 #
885 # print STDERR qq[ ?? <$tag $attribute="$u"> skipped because
different host\n] if $server->{debug} & DEBUG_LINKS;
886 # $server->{counts}{'Off-site links'}++;
887 # validate_link( $server, $u, $base ) if
$server->{validate_links};
888 # return;
889 # }
890
891 # $u->host_port( $server->{authority} ); # Force all the same host
name
892
893 # Allow rejection of this URL by user function
That comment out lines 883-891
This still obeys the max_depth which is extremely important, otherwise
you could spider the world. If you have max_depth set to more than 1
then you better know what you are doing.
Thanks to Bill Moseley (only one who seems active), but I had worked
it out myself before hand - then went on my Easter break.
Received on Mon Apr 12 15:45:47 2004