Skip to main content.
home | support | download

Back to List Archive

Re: Indexing link contents

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Fri Mar 07 2003 - 15:17:11 GMT
On Fri, 7 Mar 2003, Ander wrote:

> >By link contents you mean the value of the href attribute in an <a> tag?

> No. Whith link contents, I mean the target of the link. I mean, I want to 
> spider my own site and the sites (links) contained in my site. Is there a 
> way to do that?

Well, yes, that's the "spider" part of spider.pl. ;)

It goes to a page, indexes it, extracts out the links, then repeats the
process on the links extracted that point to the same host.

There are settings in the spider to define what the "same host" is (see
"same_hosts" in spider.pl docs (perldoc spider.pl), and you can also
defined more than one host to spider in the config file.



-- 
Bill Moseley moseley@hank.org
Received on Fri Mar 7 15:21:01 2003