On Thu, 5 Mar 1998 00prneubauer@bsuvc.bsu.edu wrote:
> Paul J. Lukas wrote:
> >
> > The breadth-first strategy of SWISH++ rather than the depth-
> > first one of SWISH-E is certainly better suited to web
> > indexing.
>
> This is an interesting proposition, but I'm afraid it's not obvious to
> me why this should be so.
The reason is simple: depth first would take an extremely long
time to complete and blow your stack moste likely since, every
time you come to a link, you'd follow it to another page and
possibly to another site. You'd never get around to finishing
the site you started at until months later. Breadth-first
would completes an entire page (and therefore an entire sitre)
at a time before moving on.
> At first glance, a breadth-first strategy looks like it would spend
> all its time listing sites and never get around to individual
> pages/documents.
You don't understand.
> I think it is clear that this strategy would never work for something like
> Altavista, so I'm sure that this was not what Paul Lukas was referring to.
> :-)
The name is "Lucas" with a 'C', the far more common spelling.
> Still, even for a limited number of sites (e.g., one) it's not obvious
> to me why breadth-first is going to be faster, more efficient or
> otherwise better suited than depth-first.
Well, what can I say. Consult an algorithms text.
> Depth-first strikes me as more obvious/natural,
Bubble sort strikes people as natural, but it sucks as a
sorting algorithm.
> but I have certainly not ever thought about it in any detail.
So let me understand: you've never though about it in any
detail, yet you're coming to conclusions?
> Is the efficiency dependent on how broad the breadth is and/or how deep the
> depth is?
I never said the efficiency issue was related to whether it does
breath- or depth-first, so the answer is no.
> If not, what makes one better for web indexing?
See above.
> Paul L. says that swish++ is "an order of magnitude faster than SWISH-E." I
> am not going to dispute the truth of this statement, but if the factor that
> makes it true is a breadth-first algorithm rather than a depth-first
> algorithm, then there is probably a simple explanation for why this is so.
I never cited that as a reason precicely because it isn't a
reason. Read the README for the real reasons why it is faster.
- Paul J. Lucas
NASA Ames Research Center Caelum Research Corporation
Moffett Field, California San Jose, California
<pjl AT ptolemy DOT arc DOT nasa DOT gov>
Received on Thu Mar 5 09:15:31 1998