Skip to main content.
home | support | download

Back to List Archive

Re: Merging vs. Spider

From: Gerald J. Klaas <gklaas(at)not-real.arb.ca.gov>
Date: Thu Oct 03 2002 - 17:09:56 GMT
>In test_url() you first rewrite the URL into a
>path, check for the file on disk, if there use that, if not you let the
>spider fetch it from the server.

That was my first thought as well, but I can tell
you we use a proxying web server at ARB ( http://www.arb.ca.gov )
and index close to about 30,000 docs in under an hour, so
I don't see that you'll gain that much.  *My* approach
would be to run the spider right after a cache flush and
NOT do it every night, the cache will stay up to date
on its own.  I'd even think you might do it anyway so
your users are always getting pages from cache. IMHO,

Gerald
Received on Thu Oct 3 17:16:02 2002