Skip to main content.
home | support | download

Back to List Archive

Re: 2.4.3 Refuses to Index Virtual Host

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Sat Apr 09 2005 - 16:48:15 GMT
On Sat, Apr 09, 2005 at 07:29:20AM -0700, fh oregon wrote:
> As for your second point, I'm not any kind of company - just a guy with 
> a fairly large personal web site who (as a hobby) hosts some email lists 
> and web sites for a car club and a food club.

But http://mysite.com looks like an email/hosting service.  That's not
you?  Carclub.com looks like a company, too.  So I'm confused.  Or are
those just names you borrowed to use on this list?


> Since the root of 
> "carclub.com"  ("/CARS") is contained within the "mysite.com" tree, I 
> would expect that it would be indexed on the same pass.  It would be 
> interesting to understand just how swish-e traverses the website tree - 
> in looking at the log file, it appears to be jumping around and not 
> following the directory structure as I would expect.  Kinda makes me go 
> "hmmmm".

Think about it.  All a web spider can do is follow links.  It has no
idea about your directory structure at all.  Many web sites don't even
have any directory structure -- they are all dynamically created from
a database.

There is no way for the spider to know that http://carclub.com is
contained in http://mysite.com's web or file space.  If you had a
spider that didn't limit to specific hosts then you would end up
indexing the entire Web.  If you want to index a host you have to tell
the spider to index that host.

Post more details about what you want to do and we can help you get it
done.

-- 
Bill Moseley
moseley@hank.org

Unsubscribe from or help with the swish-e list: 
   http://swish-e.org/Discussion/

Help with Swish-e:
   http://swish-e.org/current/docs
   swish-e@sunsite.berkeley.edu
Received on Sat Apr 9 09:48:31 2005