Skip to main content.
home | support | download

Back to List Archive

Re: Regex question

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Thu Dec 02 2004 - 14:19:31 GMT
On Thu, Dec 02, 2004 at 03:10:56AM -0800, Andre wrote:
> Hi , 
> 
> for skip all th Url that contains a query  I use in my spider.conf  this
> function :
> 
> test_url => sub { $_[0]->path !~ /.*\?.*$/ }
> 
> so I can eliminate the URL that I don't want. 
> 
> But this don't work. What is wrong? 

You should read the docs.

The $uri->path is the *path* which doesn't contain the query string.

$ perl -MURI -le 'print URI->new("http://test.com/path/is/here?this=query&key=value")->path'
/path/is/here

$ perl -MURI -le 'print URI->new("http://test.com/path/is/here?this=query&key=value")->query'
this=query&key=value


You can use this for what you are trying above, but I'd recommend
other methods (like checking the $uri->query method):

$ perl -MURI -wle 'print URI->new("http://test.com/path/is/here?this=query&key=value")->path_query'
/path/is/here?this=query&key=value



$ perl -MURI -wle 'print join "::", URI->new("http://test.com/path/is/here?this=query&key=value")->query_form'
this::query::key::value


If you want to ignore anything with a query string (are you sure you
want to do that?) then

   test_url => sub { $_[0]->query }

But that may not work like you expect -- notice the "?" on the second
one:

$ perl -MURI -wle 'print defined URI->new("http://test.com/path/is/here")->query ? "yes" : "no"'
no

$ perl -MURI -wle 'print defined URI->new("http://test.com/path/is/here?")->query ? "yes" : "no"'
yes




-- 
Bill Moseley
moseley@hank.org

Unsubscribe from or help with the swish-e list: 
   http://swish-e.org/Discussion/

Help with Swish-e:
   http://swish-e.org/current/docs
   swish-e@sunsite.berkeley.edu
Received on Thu Dec 2 06:19:31 2004