On Mon, Oct 27, 2003 at 07:28:21AM -0800, narayananps@hp.com wrote:
> But I am not able to get the spider recurse thru all the links in
> index.html.
> I see from the perl doc that the default html tags for links is <a> So I
> dont specify it in my conf.
> Still I am not able to do a recursive spidering.
Enable debugging. You can enable the debug feature to show you the
links extracted from docs and another option to tell you what links are
skipped and why.
> Also, is there a utility to configure the spider via a proxy on windows ( i
> want to spider an external site from inside a firewall) ?
Run perldoc LWP::UserAgent. It describes how to use a proxy with LWP
(which is what spider.pl uses). I don't think there's an easy way to
enable the proxy from within the config file, but it should be easy to
see where to call $ua->proxy or $ua->env_proxy;
--
Bill Moseley
moseley@hank.org
Received on Mon Oct 27 16:35:20 2003