Skip to main content.
home | support | download

Back to List Archive

Re: Swish-E and HTML documents with frames

From: Chris Humphries <ChrisJMH(at)not-real.vermilion99.freeserve.co.uk>
Date: Sat Feb 26 2000 - 11:43:29 GMT
Dear Ron,

You wrote: "But if you find one of the pages that is indirectly referenced, 
you get the page only."

This is very true, and if one were spidering indiscriminately, it would be 
a problem because there is probably no way of knowing that the page you had 
found *was* indirectly referenced. However, most of my indexing so far has 
been just the first page of a Web site, which means that my approach to 
reading through the frames is probably safe. Each Web site will already 
have been looked at by a human being and its basic structure understood.

If you can think of a case you would like to see handled that isn't handled 
by the approach I am using, I would really appreciate it if you could 
supply a url for me to try out.

Many thanks,

Chris Humphries

-----Original Message-----
From:	Ron Samuel Klatchko [SMTP:rsk@brightmail.com]
Sent:	Saturday, February 26, 2000 1:29 AM
To:	ChrisJMH@vermilion99.freeserve.co.uk
Cc:	Multiple recipients of list
Subject:	Re: [SWISH-E] Re: Swish-E and HTML documents with frames

Chris Humphries wrote:
> The way my system works, all the "frame src" links are read to create one
> big file, and *any* "a href" links found in any of those files are 
returned
> as if they were from that one big file. This means that to get at any <A>
> tags in the HTML pages you describe, one would need to set the spider to
> read to a depth of 2.

That's not what I'm worried about.  If you find one of the pages
directly referenced from the frameset, then you get the entire
frameset.  But if you find one of the pages that is indirectly
referenced, you get the page only.  Is that behavior acceptable?  The
first one is nicer but the second one will be more common.

moo
------------------------------------------------------------
           Ron Samuel Klatchko - Software Jester
            Brightmail Inc - rsk@brightmail.com
Received on Sat Feb 26 06:47:13 2000