I have been trying to spider/crawl an off-site sub-domain several times and
it doesn't seem to be working. I also seem to have a problem trying to
spider/crawl a certain regular domain. I can't figure out the problem. I
know there is a redirect, from the www to the non-www. The spider picks up
the robots.txt and nothing more. Are there things I need to be aware of
about the spider that are not in the documentation? Also, when will the
spider be updated next? And when will Swish-e be updated for UTF-8?
Also, I am concerned about something I read in the documentation about
spidering sub-domains, that the index may point the links to the pages
without the sub-domain. In other words, sub.domain.com/mypage.html would be
indexed as domain.com/mypage.html, unless some tweaking of the code is
done. Is this true?
I know that though the questions are specific, some of the details are
vague. I apologize. I would rather not post the actual URL's I am trying
to crawl.
*********************************************************************
Due to deletion of content types excluded from this list by policy,
this multipart message was reduced to a single part, and from there
to a plain text message.
*********************************************************************
Received on Thu Jan 4 09:58:02 2007