On Thu, Dec 02, 2004 at 03:10:56AM -0800, Andre wrote:
> Hi ,
>
> for skip all th Url that contains a query I use in my spider.conf this
> function :
>
> test_url => sub { $_[0]->path !~ /.*\?.*$/ }
>
> so I can eliminate the URL that I don't want.
>
> But this don't work. What is wrong?
You should read the docs.
The $uri->path is the *path* which doesn't contain the query string.
$ perl -MURI -le 'print URI->new("http://test.com/path/is/here?this=query&key=value")->path'
/path/is/here
$ perl -MURI -le 'print URI->new("http://test.com/path/is/here?this=query&key=value")->query'
this=query&key=value
You can use this for what you are trying above, but I'd recommend
other methods (like checking the $uri->query method):
$ perl -MURI -wle 'print URI->new("http://test.com/path/is/here?this=query&key=value")->path_query'
/path/is/here?this=query&key=value
$ perl -MURI -wle 'print join "::", URI->new("http://test.com/path/is/here?this=query&key=value")->query_form'
this::query::key::value
If you want to ignore anything with a query string (are you sure you
want to do that?) then
test_url => sub { $_[0]->query }
But that may not work like you expect -- notice the "?" on the second
one:
$ perl -MURI -wle 'print defined URI->new("http://test.com/path/is/here")->query ? "yes" : "no"'
no
$ perl -MURI -wle 'print defined URI->new("http://test.com/path/is/here?")->query ? "yes" : "no"'
yes
--
Bill Moseley
moseley@hank.org
Unsubscribe from or help with the swish-e list:
http://swish-e.org/Discussion/
Help with Swish-e:
http://swish-e.org/current/docs
swish-e@sunsite.berkeley.edu
Received on Thu Dec 2 06:19:31 2004