Skip to main content.
home | support | download

Back to List Archive

Re: Re: Frames

From: Matteo Barbieri <matteo(at)>
Date: Mon Jun 28 1999 - 15:45:22 GMT
besides the frame issue that I'll solve later I think that some problems
resides in the http spider..
for example
i put as start page
(you can check it just to see how is formatted)

this is what I got when indexing

IndexFile /online/www/home/glamm/myindex1
"glammhttp.config" 192 lines, 7399 characters
# /usr/local/bin/swish-e -c glammhttp.config -S http
Indexing Data Source: "HTTP-Crawler"
retrieving (0)...
 (138 words)

Removing very common words... no words removed.
Writing main index... 96 unique words indexed.
Writing file index... 1 file indexed.
Running time: 1 minute, 1 second.
Indexing done!

As a matter of fact it indexed just that page
even if the variable
MaxDepth 5
is set to 5
and that file (latosx.htm) contains some links.

Why it doesn't follow those links?

*********** REPLY SEPARATOR  ***********

On 28/06/99 at 8.06 Roy Tennant wrote:

>The way we have handled this is to use the regular expressions capability
>to replace the indexed file name with the frameset. That is, if
>"mypage.html" is the page that sets up the frames and calls the other page
>fragments, then name the page fragments uniquely and rename them in the
>index using "ReplaceRules" in your configuration file.
>mypage.html indexed under its own name
>frag1.html indexed as "mypage.html"
>frag2.html indexed as "mypage.html"
>Thus all of the pieces point to the frameset.
>On Mon, 28 Jun 1999, Dan Brickley wrote:
>> On Mon, 28 Jun 1999, Matteo Barbieri wrote:
>> > I successfully created my first index file in filesystem mode..
>> > In http mode I found that the robot doesn't traverse the site
>> > but stops on the first html
>> > I don't get back any error so I am wondering if the spider is
>> > frame aware.
>> As an aside, it's difficult in the general case building a
>> frameset-aware robot and search tool, since the composite-frameset
>> doesn't have its own URL, so you'd need to auto-generate the appropriate
>> frameset and populate it with the two or three appropriate URLs if you
>> wanted to present users with the pages they'd found. (otherwise you can
>> show them the page, but they'd lose all navigational context from the
>> surrounding frame parts)
>> Dan
>> --
>> Institute for Learning and Research Technology
>> University of Bristol,  Bristol BS8 1TN, UK.   phone:+44(0)117-9287096

  Dott. Matteo Barbieri (
  GLAMM Interactive
  V.le Corsica 7, 20133 Milano
  Tel.  +39 - 2 - 74.81.171 	Fax.  +39 - 2 - 74.81.1726 
Received on Mon Jun 28 08:37:38 1999