Paul J. Lukas wrote:
>
> The breadth-first strategy of SWISH++ rather than the depth-
> first one of SWISH-E is certainly better suited to web
> indexing.
This is an interesting proposition, but I'm afraid it's not obvious to
me why this should be so. For reasons I currently don't understand, I
appear to have missed getting some mail recently, so I am not sure
exactly what the matter under discussion was here, but it appears to
involve indexing remote web sites (in addition to | instead of) a
local web site.
At first glance, a breadth-first strategy looks like it would spend
all its time listing sites and never get around to individual
pages/documents. I think it is clear that this strategy would never
work for something like Altavista, so I'm sure that this was not what
Paul Lukas was referring to. :-)
Still, even for a limited number of sites (e.g., one) it's not obvious
to me why breadth-first is going to be faster, more efficient or
otherwise better suited than depth-first. Depth-first strikes me as
more obvious/natural, but I have certainly not ever thought about it
in any detail. Is the efficiency dependent on how broad the breadth
is and/or how deep the depth is? If so, at what point does it flip?
If not, what makes one better for web indexing? Paul L. says that
swish++ is "an order of magnitude faster than SWISH-E." I am not
going to dispute the truth of this statement, but if the factor that
makes it true is a breadth-first algorithm rather than a depth-first
algorithm, then there is probably a simple explanation for why this is
so. I don't see it at the moment. I humbly beg enlightenment.
Paul
========
Paul Neubauer prn@bsu.edu 00prneubauer@bsuvc.bsu.edu
For PGP Public Key send mail with subject="Send PGP Public Key"
1024 bits -- Key ID: 3FEB993D
Key Fingerprint: 85 AA A5 91 00 49 7A 7B 23 26 F7 B8 DB 72 C9 48
Received on Thu Mar 5 05:45:13 1998