What does the debug show when swish-e GETs the login page?
Do you need cookies enabled?
Is the username szUsername (as used for the form POST), or is it szID
as in the swish-e GET below?
Bill
On Feb 20, 2008, at 9:10 AM, Adam Douglas wrote:
> Hi. Alright I have implemented this method to index my side of the web
> site that requires authentication. I have also altered my login to
> allow
> Swish-e to login via URL query string. I manually verified that this
> method via the URL query string works. However when I index the web
> site
> it does not login and just acts as a non-authenticated client. I
> obviously am missing something here, any suggestions as to what I'm
> doing wrong?
>
> I index the web site like so, "swish-e -S prog -c
> swishe.venmarces.private.conf".
> Here is what I have in my SwishSpiderConfig.pl. The rest is just
> comments.
>
> @ servers = ({
>
> base_url =>
> 'https://www.venmarces.com/login/?szID=idhere&szPWD=passwordhere',
> same_hosts => [ qw/www.venmarces.com/ ],
> agent => 'swish-e spider http://swish-e.org/',
> email => 'webmaster@domainnamehere.com',
>
> # limit to only .html files
> test_url => sub {
> my $ok = !($_[0]->path =~ /login/ &&
> $_[0]- >query =~ /logout/);
> return 1 if $ok;
> return; },
>
> delay_sec => 1, # Delay in seconds between requests
> max_time => 10, # Max time to spider in minutes
> max_files => 100, # Max Unique URLs to spider
> max_indexed => 20, # Max number of files to send to
> swish
> for indexing
> keep_alive => 1, # enable keep alives requests
> debug => DEBUG_URL | DEBUG_SKIPPED | DEBUG_HEADERS,
> } );
> 1;
>
> Also how would I get Swish-e when indexing that when its finishes
> to go
> to the URL "/logout/" to logout of the web site ending the
> session/authentication?
>
> By the way, I have the following configuration files setup.
>
> Search configuration - .venmarces.public.swishcgi.conf
> Search configuration - .venmarces.private.swishcgi.conf
> Swish-e configuration - swishe.venmarces.public.conf
> Swish-e configuration - swishe.venmarces.private.conf
> Spider.pl configuration - SwishSpiderConfig.pl (I have not made one
> for
> public yet).
>
> Best,
> Adam
>
>>> I'm not sure why it's any more dangerous to require/allow
>> the swish-e
>>> spider to login to an application than any other user agent that
>>> presents credentials. In fact for a public facing application, far
>>> more checks can be applied
>>> (username/password;IP_address;one-of-a-
>>> kind user agent) to the spider than is feasible with a
>> normal user's
>>> login.
>>>
>>> Merely enabling cookies by itself presents just as much risk of
>>> forgery.
>>>
>>> Anyway, here's a snip from my @servers:
>>>
>>> @servers = (
>>> {
>>> base_url => 'http://my.domain.com/login.app?
>>> _function=checkpw&userid=swishe&password=swishe&remember=no',
>>> use_cookies => 1,
>>> # debug => DEBUG_URL | DEBUG_SKIPPED | DEBUG_FAILED |
>>> DEBUG_HEADERS,
>>> delay_sec => 1,
>>> test_url => sub {
>>> my $ok = !($_[0]->path =~ /login.app/ &&
>>> $_[0]- >query =~ /_function=logout/);
>>> return 1 if $ok;
>>> return; },
>>> ...
>>>
>>> Essentially, the spider logs in as the user 'swishe' so it sees the
>>> same content as any similarly privileged user.
>>> remember=no means don't give swish-e a long-term cookie to
>>> re-authenticate with.
>>> use_cookies allows the application to provide, and swish-e
>> to use the
>>> session cookies needed for access test_url keeps the spider from
>>> following a link to log out, to assure we follow all links.
>
> This message (including any attachments) is intended only for the
> use of the individual or entity to which it is addressed and may
> contain information that is non-public, proprietary,privileged,
> confidential, and exempt from disclosure under applicable law or
> may constitute as attorney work product. If you are not the
> intended recipient, you are hereby notified that any use,
> dissemination, distribution, or copying of this communication is
> strictly prohibited. If you have received this communication in
> error, notify us immediately by telephone and
> (i) destroy this message if a facsimile or (ii) delete this message
> immediately if this is an electronic communication. Thank you.
_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Wed Feb 20 12:30:06 2008