Skip to main content.
home | support | download

Back to List Archive

Re: RE: swish-e spider does not go beyond index.html

From: Roy Tennant <rtennant(at)>
Date: Fri Oct 16 1998 - 19:15:53 GMT
The answer to the problem of how to index frames? Just don't use frames!
<grin> Now seriously...

We did something very close to what Ron describes for indexing frames
pages. But luckily the filenames of the individual frame components were
somewhat unique, so that allowed us to use the ReplaceRules function to do

ReplaceRules replace "[a-z_0-9]*_m.*\.html" "index.html"

You notice that we use the regular expression capability to do it. This
allows you to point any instance of a frame component to "index.html". The
one remaining problem is having to drop duplicates, which you can do in
the program calling SWISH-E.
Roy Tennant

On Fri, 16 Oct 1998, Ron Klatchko wrote:

> Christian Stalberg wrote:
> > Oops, someone reminded me that starting with a frames webpage will not work.
> > I have changed the IndexDir to a TOC page for the frames and it appears to
> > be working. Is there any special wisdom anyone can share re. using swish-e
> > to index frames webpages?
> Frames are a tricky situation when it comes to searching.  It would be
> simple to make the spider fall frame links as well, but what happens on
> retrieval?  SWISH would return the URL of the individual frame that
> contained what they were searching for and the user would see only that
> frame instead of the nicely constructed frameset you constructed.
> There might be a solution to that in some clever file layout and use of
> ReplaceRules.  One idea is below.
> Another possibility would be to have a no frames version with identical
> content.  SWISH can spider that currently.  This also has the nice
> benefit of opening your site to non-frames aware browsers. 
> Unfortunately, even frames aware browser would end up with the
> non-frames version when they search.
> So, going back to the clever layout/rewrite idea.  Let's assume that
> swishspider can now follow frame links.  Also assume you have a basic
> frame set with the left side as a table of contents and the right side
> with your various data pages.
> In order to do this, you'll need a main directory and one subdirectory
> for each page.
> The main directory contains index.html which defines your frameset and
> toc.html which is your table of contents.  You have a series of
> directories called page1, page2, etc. inside of which you have
> page1.html, page2.html, etc.  Also, each of these directories contains
> index.html.  The different between this and the main index.html is the
> starting page for the right side; the main index.html points to page 1
> where the one in the subdirectories points to their own page
> (page2/index.html has page2.html as the right hand side).  For an
> example of this structure you can check out
> If you then introduce the rule:
>   ReplaceRules remove "page[0-9]+.html"
> a search that gets ../pageN/pageN.html gets rewritten to ../pageN/ which
> preserves the frame set.
> More complicated use of frames would require even more thought, but it
> is a possibility.
> Are people interested in doing such a thing?  Should I modify
> swishspider to be able to follow framelinks?
> moo
> ----------------------------------------------------------------------
>           Ron Klatchko - Manager, Advanced Technology Group           
>            UCSF Library and Center for Knowledge Management           
Received on Fri Oct 16 12:22:54 1998