On Thu, Jan 04, 2007 at 09:50:46AM -0800, James wrote:
> I have been trying to spider/crawl an off-site sub-domain several times and
> it doesn't seem to be working. I also seem to have a problem trying to
> spider/crawl a certain regular domain. I can't figure out the problem. I
> know there is a redirect, from the www to the non-www. The spider picks up
> the robots.txt and nothing more. Are there things I need to be aware of
> about the spider that are not in the documentation?
Just things that are in the docs. ;) Did you turn on any of the
debugging features to find out why it's not fetching the pages you
think it should be fetching?
> Also, when will the spider be updated next?
In what way?
> And when will Swish-e be updated for UTF-8?
That's a large task, and it depends on when there's a big block of
developer time available.
> Also, I am concerned about something I read in the documentation about
> spidering sub-domains, that the index may point the links to the pages
> without the sub-domain. In other words, sub.domain.com/mypage.html would be
> indexed as domain.com/mypage.html, unless some tweaking of the code is
> done. Is this true?
There's a way to say that two domains are the same domain (used, for
example, where a site's pages can be accessed with or without the
leading "www.".
--
Bill Moseley
moseley@hank.org
Unsubscribe from or help with the swish-e list:
http://swish-e.org/Discussion/
Help with Swish-e:
http://swish-e.org/current/docs
swish-e@sunsite.berkeley.edu
Received on Thu Jan 4 10:05:39 2007