Skip to main content.
home | support | download

Back to List Archive

Re: RE: swish-e spider does not go beyond index.html

From: Ron Klatchko <ron(at)not-real.ckm.ucsf.edu>
Date: Fri Oct 16 1998 - 19:02:54 GMT
Christian Stalberg wrote:
> Oops, someone reminded me that starting with a frames webpage will not work.
> I have changed the IndexDir to a TOC page for the frames and it appears to
> be working. Is there any special wisdom anyone can share re. using swish-e
> to index frames webpages?

Frames are a tricky situation when it comes to searching.  It would be
simple to make the spider fall frame links as well, but what happens on
retrieval?  SWISH would return the URL of the individual frame that
contained what they were searching for and the user would see only that
frame instead of the nicely constructed frameset you constructed.

There might be a solution to that in some clever file layout and use of
ReplaceRules.  One idea is below.

Another possibility would be to have a no frames version with identical
content.  SWISH can spider that currently.  This also has the nice
benefit of opening your site to non-frames aware browsers. 
Unfortunately, even frames aware browser would end up with the
non-frames version when they search.

So, going back to the clever layout/rewrite idea.  Let's assume that
swishspider can now follow frame links.  Also assume you have a basic
frame set with the left side as a table of contents and the right side
with your various data pages.

In order to do this, you'll need a main directory and one subdirectory
for each page.

The main directory contains index.html which defines your frameset and
toc.html which is your table of contents.  You have a series of
directories called page1, page2, etc. inside of which you have
page1.html, page2.html, etc.  Also, each of these directories contains
index.html.  The different between this and the main index.html is the
starting page for the right side; the main index.html points to page 1
where the one in the subdirectories points to their own page
(page2/index.html has page2.html as the right hand side).  For an
example of this structure you can check out
http://samiam.ckm.ucsf.edu/frame/

If you then introduce the rule:
  ReplaceRules remove "page[0-9]+.html"

a search that gets ../pageN/pageN.html gets rewritten to ../pageN/ which
preserves the frame set.

More complicated use of frames would require even more thought, but it
is a possibility.

Are people interested in doing such a thing?  Should I modify
swishspider to be able to follow framelinks?

moo
----------------------------------------------------------------------
          Ron Klatchko - Manager, Advanced Technology Group           
           UCSF Library and Center for Knowledge Management           
                           ron@ckm.ucsf.edu
Received on Fri Oct 16 12:13:04 1998