Skip to main content.
home | support | download

Back to List Archive

Re: [swish-e] How do I index via HTTP when authentication is

From: Adam Douglas <ADouglas(at)not-real.venmarces.com>
Date: Thu Feb 07 2008 - 14:59:42 GMT
Okay I am going to try this but I'm sorry but I'm so confused now as to
where to put this @servers example. Do I place it in the spider.pl file
or do I place it in a SwishSpiderConfig.pl or what? I have already two
configuration files. One for swish.cgi and one for the spider. I do my
indexing like so at the shell prompt, "swish-e -S prog -c
swish-e.public.conf". Also what is confusing me is the base_url is
already set in the swish-e.public.conf? Do I just use that add the extra
query strings on to it and then add a new line with use_cookies = 1 ?

Sorry for the confusion just appears that one can go so many directions
on how to achieve this. For obvious reason I want to keep my
modifications in the configuration files and not modify any code if
possible.

Best,
Adam

> I'm not sure why it's any more dangerous to require/allow the 
> swish-e spider to login to an application than any other user 
> agent that presents credentials.  In fact for a public facing 
> application, far more checks can be applied 
> (username/password;IP_address;one-of-a-
> kind user agent) to the spider than is feasible with a normal 
> user's login.
> 
> Merely enabling cookies by itself presents just as much risk 
> of forgery.
> 
> Anyway, here's a snip from my @servers:
> 
> @servers = (
>          {
>          base_url    => 'http://my.domain.com/login.app? 
> _function=checkpw&userid=swishe&password=swishe&remember=no',
>          use_cookies => 1,
> #        debug => DEBUG_URL | DEBUG_SKIPPED | DEBUG_FAILED |  
> DEBUG_HEADERS,
>          delay_sec => 1,
>          test_url    => sub {
>                  my  $ok =  !($_[0]->path =~ /login.app/ && 
> $_[0]-  >query =~ /_function=logout/);
>                  return 1 if $ok;
>                  return; },
> ...
> 
> Essentially, the spider logs in as the user 'swishe' so it 
> sees the same content as any similarly privileged user. 
> remember=no means don't give swish-e a long-term cookie to 
> re-authenticate with.
> use_cookies allows the application to provide, and swish-e to 
> use the session cookies needed for access test_url keeps the 
> spider from following a link to log out, to assure we follow 
> all links.

This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary,privileged, confidential, and exempt from disclosure under applicable law or may constitute as attorney work product. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. If you have received this communication in error, notify us immediately by telephone and
(i) destroy this message if a facsimile or (ii) delete this message
immediately if this is an electronic communication. Thank you.
_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Thu Feb 7 09:59:44 2008