Skip to main content.
home | support | download

Back to List Archive

Re: URL-fixing with callback routines for spider.pl

From: koszalekopalek <koszalekopalek(at)not-real.interia.pl>
Date: Mon May 23 2005 - 09:23:04 GMT
Bill Moseley wrote:
> On Thu, May 19, 2005 at 09:07:34AM -0700, koszalekopalek wrote:
> 
>>Btw, any pointers on why the server is not happy
>>with use_cookies => 1, ?
> 
> 
> I'm not sure, but I'd like to find out -- just in case something isn't
> working like it should.

Bill,

I checked it over the weekend.

   use_cookies => 1,

started to work fine as soon as I lied about the user agent in the
config file:

   agent => 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; 
NET CLR 1.1.4322)',


> You can do this:
> 
>    SPIDER_DEBUG=headers ./spider.pl test.conf >/dev/null

Server headers reveal it powered by MS:

   Server: Microsoft-IIS/6.0
   MicrosoftOfficeWebServer: 5.0_Pub
   X-AspNet-Version: 2.0.40607
   X-Powered-By: ASP.NET

I'll try to contact the webmaster and verify whether it is
an issue in the site code or just bad behavior of some MS
components. In the latter case, a FAQ entry might be useful.

Thanks,
Adam

> and see the headers.  You can also use debug => 'headers', in your
> config.
> 
> One thing I'm seeing is the Referer: header is the new $uri set in the
> filter_content() callback.  Your server isn't looking at the Referer
> header, is it?
> 


------------------------------------------------------------------
Randka przez komorke?
>> http://link.interia.pl/f187f <<
Received on Mon May 23 02:23:08 2005