Skip to main content.
home | support | download

Back to List Archive

crawling protected site

From: intervolved none <intervolved(at)not-real.yahoo.com>
Date: Tue May 10 2005 - 21:36:10 GMT
I need to crawl a website that is protected by windows authentication but when swish-e tries to crawl it it returns a 401 error.  I pass in the username and password the same way that I have tried using IE (  http://username:password@www.somedomain.com ) and swish-e does not work.  I have attached a condensed config file and the output that is generated when I run the command to index the site.  Thanks in advance.
 
 
c:> type mytestsite.config   (subset of config file)

MaxDepth 0
Delay 0
IndexContents HTML2 .htm .html .shtml
IndexContents TXT .pdf 
IndexFile newprimarycare.index
StoreDescription HTML2 <body> 200
StoreDescription TXT 200
DefaultContents HTML2 
IndexDir http://myusrname:mypassword@mysite.com/main.html

 
 
c:> swish-e.exe -v 3 -S http -c "mytestsite.config"
 
..
Now fetching ;http://myusrname:mypassword@mysite.com/main.html"... Status: 401.
..

 
 

		
---------------------------------
Yahoo! Mail Mobile
 Take Yahoo! Mail with you! Check email on your mobile phone.


*********************************************************************
Due to deletion of content types excluded from this list by policy,
this multipart message was reduced to a single part, and from there
to a plain text message.
*********************************************************************
Received on Tue May 10 14:36:22 2005