Re: Regex question

From: Bill Moseley <moseley(at)>
Date: Thu Dec 02 2004 - 14:19:31 GMT
On Thu, Dec 02, 2004 at 03:10:56AM -0800, Andre wrote:
> Hi , 
> for skip all th Url that contains a query  I use in my spider.conf  this
> function :
> test_url => sub { $_[0]->path !~ /.*\?.*$/ }
> so I can eliminate the URL that I don't want. 
> But this don't work. What is wrong? 

You should read the docs.

The $uri->path is the *path* which doesn't contain the query string.

$ perl -MURI -le 'print URI->new("")->path'

$ perl -MURI -le 'print URI->new("")->query'

You can use this for what you are trying above, but I'd recommend
other methods (like checking the $uri->query method):

$ perl -MURI -wle 'print URI->new("")->path_query'

$ perl -MURI -wle 'print join "::", URI->new("")->query_form'

If you want to ignore anything with a query string (are you sure you
want to do that?) then

   test_url => sub { $_[0]->query }

But that may not work like you expect -- notice the "?" on the second

$ perl -MURI -wle 'print defined URI->new("")->query ? "yes" : "no"'

$ perl -MURI -wle 'print defined URI->new("")->query ? "yes" : "no"'

Bill Moseley

