Skip to main content.
home | support | download

Back to List Archive

converting .temp indices to usable indices

From: Dave Stevens <dstevens(at)>
Date: Sun Dec 07 2003 - 01:25:33 GMT
The spider is doing pretty well, nearly a million pages crawled in the
last couple of weeks.  One issue I just came on is with a dynamic site
that hosts several trade publications using a common app to provide
content from each of the pubs.  The URL is the same as  The app only uses the argumetns
from the URL, not the domain name.  For future crawls I'm pretty sure I
can filter what I want only and crawl this site on it's own. (I want
mag=7)  It appears I can do that with a callback.

The issue here is that this crawl is about four days old and has about a
dozen other sites in the index.  The prop.temp file and the .temp index
file are being written.  If I kill this crawl by terminating, is
there any way convert those .temp files left by the terminated crawl to
usable indices?  This one has so much junk in it that it probably not
usable for this, but I'd like to get a look at what was spidered from the
other 12 sites.  I've looked in the archive and manual and couldn't find

Why isn't there a Swish-e O'Reily book? ;-) The docs and list are really
good but a larger reference with more production examples would be a great


Received on Sun Dec 7 01:25:38 2003