Skip to main content.
home | support | download

Back to List Archive

Re: spider test_url issue

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Sat Apr 09 2005 - 04:54:41 GMT
On Fri, Apr 08, 2005 at 06:09:10PM -0700, Bill Conlon wrote:
> I've narrowed the problem down to my test_url callback, meant to 
> exclude a link:
> /login.taf?_function=logout:
> 
>          test_url    => sub { $_[0]->path !~ /\logout?$/ }

What's the backslash for?  The "?" says zero or one "t".

Try perldoc URI.  The path() method gets the path, and "logout" is
not part of the path, it's part of the query string.

    moseley(at)not-real.laptop:~$ perl -MURI -le 'print URI->new("http://test/login.taf?_function=logout")->path'
    /login.taf
    moseley(at)not-real.laptop:~$ perl -MURI -le 'print URI->new("http://test/login.taf?_function=logout")->query'
    _function=logout

Dealing with the query is not the easiest with URI objects.  Look at
perldoc URI::QueryParam if you want to do more complex things with
query strings.  Keep in mind some methods return the query as-is and
some will decode the query (like convert %20 or a "+" to a space).

When you have questions like this then try something like:

    test_url => sub {
        my ( $uri ) = @_;
        print STDERR "Uri = [$uri]\n";
        print STDERR "uri path is [" . $uri->path . "]\n";
    },

Aren't you glad the spider is in Perl instead of C?  Makes debugging
much easier.


-- 
Bill Moseley
moseley@hank.org

Unsubscribe from or help with the swish-e list: 
   http://swish-e.org/Discussion/

Help with Swish-e:
   http://swish-e.org/current/docs
   swish-e@sunsite.berkeley.edu
Received on Fri Apr 8 21:54:41 2005