Skip to main content.
home | support | download

Back to List Archive

Re: advantages and disadvantages of indexing via the spider

From: Greg Fenton <greg_fenton(at)not-real.yahoo.com>
Date: Mon Feb 16 2004 - 19:06:33 GMT
--- Eric Lease Morgan <emorgan@nd.edu> wrote:
> 
> What are the advantages and disadvantages of indexing via the the 
> spider?
> 

Since you are talking about a "remote site", then as you said you
either have to use spider.pl or some other crawler to get the pages.

Ignoring the features of one crawler over another, the upside of
spider.pl is the lower disk requirements and the guarantee of "fresh"
data.  The downside is, in the event of needing to rebuild the
database, indexing will be slower than indexing a pre-crawled local
disk cache.

We use spider.pl for our *local* site because we have dynamic content
(e.g. Server Side Includes), so filesystem crawls wouldn't be accurate
or would involve more coding on our part.  Since we have an internal
staging server, we don't impact the production site should we need to
rebuild the database a few times a day.

Hope this helps,
greg_fenton.

=====
Greg Fenton
greg_fenton@yahoo.com

__________________________________
Do you Yahoo!?
Yahoo! Finance: Get your refund fast by filing online.
http://taxes.yahoo.com/filing.html
Received on Mon Feb 16 11:06:41 2004