YES!
Christian
Christian Stalberg
Web Consultant, Dept. 3187
NORTEL, Signaling Solutions Group
* Phone: (919) 905-4975 ESN 355-4975
* Fax: (919) 905-8313 ESN 395-8313
* Email: Christian.Stalberg.stalberg@nt.com
<mailto:Christian.Stalberg.stalberg@nt.com>
----------
From: Ron Klatchko [SMTP:ron@ckm.ucsf.edu]
Sent: Friday, October 16, 1998 3:14 PM
To: Multiple recipients of list
Subject: [SWISH-E] RE: swish-e spider does not go beyond index.html
Christian Stalberg wrote:
> Oops, someone reminded me that starting with a frames webpage will
not work.
> I have changed the IndexDir to a TOC page for the frames and it
appears to
> be working. Is there any special wisdom anyone can share re. using
swish-e
> to index frames webpages?
Frames are a tricky situation when it comes to searching. It would
be
simple to make the spider fall frame links as well, but what happens
on
retrieval? SWISH would return the URL of the individual frame that
contained what they were searching for and the user would see only
that
frame instead of the nicely constructed frameset you constructed.
There might be a solution to that in some clever file layout and use
of
ReplaceRules. One idea is below.
Another possibility would be to have a no frames version with
identical
content. SWISH can spider that currently. This also has the nice
benefit of opening your site to non-frames aware browsers.
Unfortunately, even frames aware browser would end up with the
non-frames version when they search.
So, going back to the clever layout/rewrite idea. Let's assume that
swishspider can now follow frame links. Also assume you have a
basic
frame set with the left side as a table of contents and the right
side
with your various data pages.
In order to do this, you'll need a main directory and one
subdirectory
for each page.
The main directory contains index.html which defines your frameset
and
toc.html which is your table of contents. You have a series of
directories called page1, page2, etc. inside of which you have
page1.html, page2.html, etc. Also, each of these directories
contains
index.html. The different between this and the main index.html is
the
starting page for the right side; the main index.html points to page
1
where the one in the subdirectories points to their own page
(page2/index.html has page2.html as the right hand side). For an
example of this structure you can check out
http://samiam.ckm.ucsf.edu/frame/
If you then introduce the rule:
ReplaceRules remove "page[0-9]+.html"
a search that gets ../pageN/pageN.html gets rewritten to ../pageN/
which
preserves the frame set.
More complicated use of frames would require even more thought, but
it
is a possibility.
Are people interested in doing such a thing? Should I modify
swishspider to be able to follow framelinks?
moo
----------------------------------------------------------------------
Ron Klatchko - Manager, Advanced Technology Group
UCSF Library and Center for Knowledge Management
ron@ckm.ucsf.edu
Received on Tue Oct 27 09:39:43 1998