Skip to main content.
home | support | download

Back to List Archive

Crawling Sub-domains

From: James <swish.enhanced(at)not-real.gmail.com>
Date: Thu Jan 04 2007 - 17:57:40 GMT
I have been trying to spider/crawl an off-site sub-domain several times and
it doesn't seem to be working.  I also seem to have a problem trying to
spider/crawl a certain regular domain.  I can't figure out the problem.  I
know there is a redirect, from the www to the non-www.  The spider picks up
the robots.txt and nothing more.  Are there things I need to be aware of
about the spider that are not in the documentation?  Also, when will the
spider be updated next?  And when will Swish-e be updated for UTF-8?

Also, I am concerned about something I read in the documentation about
spidering sub-domains, that the index may point the links to the pages
without the sub-domain.  In other words, sub.domain.com/mypage.html would be
indexed as domain.com/mypage.html, unless some tweaking of the code is
done.  Is this true?

I know that though the questions are specific, some of the details are
vague.  I apologize.  I would rather not post the actual URL's I am trying
to crawl.



*********************************************************************
Due to deletion of content types excluded from this list by policy,
this multipart message was reduced to a single part, and from there
to a plain text message.
*********************************************************************
Received on Thu Jan 4 09:58:02 2007