Skip to main content.
home | support | download

Back to List Archive

Re: Unable to spider certain pages, md5 problem

From: Bill Moseley <moseley(at)>
Date: Thu Sep 09 2004 - 17:21:35 GMT
On Thu, Sep 09, 2004 at 05:39:41PM +1000, Tim Hartley wrote:
> However, my test_url function doesn't seem to be working, as I am
> still getting duplicate results caused by Uppercase/Lowercase urls
> to the same content -> /author.asp?author=Joe Blow,
> /author.asp?author=joe blow.
> Debug doesn't throw any errors or warnings regarding the test_url,
> and I've used the code given in the documentation example, but it's
> either not converting the url's to lowercase or if it is it's not
> comparing them successfully. The results are still displaying the
> output with uppercase characters, so I'm assuming it's not
> converting to lowercase.

>          test_url =>sub {
>                       my $uri = shift;
>                       $uri->path(lc$uri->path);
>                       return 1;
>           },

A print statement is a good debugging tool.

perldoc URI will discuss how to use the URI module.  But the short
answer is that the "path" and "query" are two different parts of the

Now, I know there's a better way to manage query parameters -- I just
can't remember right now, so until then you might try something like

    test_url => sub {
        my ( $uri ) = @_;
        my %params = $uri->query_form;
        $_ = lc for values %params;
        $uri->query_form( %params );
        return 1;

The important thing to think about here is that will break if you have
two parameters with the same name (like a multi-valued parameter).  If
that'st the case then you likely need to stick to arrays and not use a

Let's see, how about this:

    test_url => sub {
        my $uri = shift;
        my @params = $uri->query_form;
        return 1 unless @params;
        my $x = 0;
        $x++ %2 && ($_ = lc ) for @params;
        $uri->query_form( @params );
        return 1;

Bill Moseley
Received on Thu Sep 9 10:22:18 2004