Skip to main content.
home | support | download

Back to List Archive

RE: RE: swish-e spider does not go beyond index.html

From: Christian Stalberg <Christian.Stalberg.stalberg(at)>
Date: Tue Oct 27 1998 - 15:32:56 GMT

Christian Stalberg
Web Consultant, Dept. 3187
NORTEL, Signaling Solutions Group
* Phone: (919) 905-4975 ESN 355-4975
*      Fax: (919) 905-8313 ESN 395-8313
*     Email:

	From:  Ron Klatchko []
	Sent:  Friday, October 16, 1998 3:14 PM
	To:  Multiple recipients of list
	Subject:  [SWISH-E] RE: swish-e spider does not go beyond index.html

	Christian Stalberg wrote:
	> Oops, someone reminded me that starting with a frames webpage will
not work.
	> I have changed the IndexDir to a TOC page for the frames and it
appears to
	> be working. Is there any special wisdom anyone can share re. using
	> to index frames webpages?

	Frames are a tricky situation when it comes to searching.  It would
	simple to make the spider fall frame links as well, but what happens
	retrieval?  SWISH would return the URL of the individual frame that
	contained what they were searching for and the user would see only
	frame instead of the nicely constructed frameset you constructed.

	There might be a solution to that in some clever file layout and use
	ReplaceRules.  One idea is below.

	Another possibility would be to have a no frames version with
	content.  SWISH can spider that currently.  This also has the nice
	benefit of opening your site to non-frames aware browsers. 
	Unfortunately, even frames aware browser would end up with the
	non-frames version when they search.

	So, going back to the clever layout/rewrite idea.  Let's assume that
	swishspider can now follow frame links.  Also assume you have a
	frame set with the left side as a table of contents and the right
	with your various data pages.

	In order to do this, you'll need a main directory and one
	for each page.

	The main directory contains index.html which defines your frameset
	toc.html which is your table of contents.  You have a series of
	directories called page1, page2, etc. inside of which you have
	page1.html, page2.html, etc.  Also, each of these directories
	index.html.  The different between this and the main index.html is
	starting page for the right side; the main index.html points to page
	where the one in the subdirectories points to their own page
	(page2/index.html has page2.html as the right hand side).  For an
	example of this structure you can check out

	If you then introduce the rule:
	  ReplaceRules remove "page[0-9]+.html"

	a search that gets ../pageN/pageN.html gets rewritten to ../pageN/
	preserves the frame set.

	More complicated use of frames would require even more thought, but
	is a possibility.

	Are people interested in doing such a thing?  Should I modify
	swishspider to be able to follow framelinks?

	          Ron Klatchko - Manager, Advanced Technology Group

	           UCSF Library and Center for Knowledge Management

Received on Tue Oct 27 09:39:43 1998