I'm fairly unskilled in perl, so hopefully someone can shed a little
light on this:
=20
spider.conf:
=20
@servers =3D ({
base_url =3D> 'http://dev.site.org',
use_default_config =3D> 1,
email =3D> 'cday@mindshare.net',
test_url =3D> \&test_url
});
=20
sub test_url {
my ($uri, $server) =3D @_ ;
return if $uri->query =3D~ /PHPSESSID/;
}
=20
1;
=20
$ swish-e -c swish.conf -v 3 -S prog
Parsing config file 'swish.conf'
Indexing Data Source: "External-Program"
Indexing "/usr/local/lib/swish-e/spider.pl"
External Program found: /usr/local/lib/swish-e/spider.pl
/usr/local/lib/swish-e/spider.pl: Reading parameters from
'/usr/local/apache/htdocs/dev/components/com_swishesearch/spider.conf'
Use of uninitialized value in pattern match (m//) at
/usr/local/apache/htdocs/dev/components/com_swishesearch/spider.conf
line 11.
=20
Summary for: http://dev.site.org
Skipped: 1 (1.0/sec)
=20
Removing very common words...
no words removed.
Writing main index...
err: No unique words indexed!
=20
Is there some sort of syntax with my test_url bit I'm missing? I'm
trying to ensure the same pages aren't indexed repeatedly due to
changing PHPSESSID variables when spidering the site.
=20
Any help is greatly appreciated.
=20
Thanks,
Chad Day
*********************************************************************
Due to deletion of content types excluded from this list by policy,
this multipart message was reduced to a single part, and from there
to a plain text message.
*********************************************************************
Received on Wed Jan 4 07:55:15 2006