Skip to main content.
home | support | download

Back to List Archive

Re: crawling protected site

From: Peter Karman <peter(at)not-real.peknet.com>
Date: Tue May 10 2005 - 21:40:11 GMT
have you tried spider.pl instead? Much better than the -S http method.

I expect that the -S http method will, in fact, be deprecated in a future version.

intervolved none scribbled on 5/10/05 4:33 PM:
> I need to crawl a website that is protected by windows authentication but when swish-e tries to crawl it it returns a 401 error.  I pass in the username and password the same way that I have tried using IE (  http://username:password@www.somedomain.com ) and swish-e does not work.  I have attached a condensed config file and the output that is generated when I run the command to index the site.  Thanks in advance.
>  
>  
> c:> type mytestsite.config   (subset of config file)
> 
> MaxDepth 0
> Delay 0
> IndexContents HTML2 .htm .html .shtml
> IndexContents TXT .pdf 
> IndexFile newprimarycare.index
> StoreDescription HTML2 <body> 200
> StoreDescription TXT 200
> DefaultContents HTML2 
> IndexDir http://myusrname:mypassword@mysite.com/main.html
> 
>  
>  
> c:> swish-e.exe -v 3 -S http -c "mytestsite.config"
>  
> ..
> Now fetching ;http://myusrname:mypassword@mysite.com/main.html"... Status: 401.
> ..
> 
>  
>  
> 
> 		
> ---------------------------------
> Yahoo! Mail Mobile
>  Take Yahoo! Mail with you! Check email on your mobile phone.
> 
> 
> *********************************************************************
> Due to deletion of content types excluded from this list by policy,
> this multipart message was reduced to a single part, and from there
> to a plain text message.
> *********************************************************************

-- 
Peter Karman  .  http://peknet.com/  .  peter(at)not-real.peknet.com
Received on Tue May 10 14:40:11 2005