Skip to main content.
home | support | download

Back to List Archive

(no subject)

From: Chad Day <CDay(at)not-real.mindshare.net>
Date: Wed Jan 04 2006 - 15:54:12 GMT
I'm fairly unskilled in perl, so hopefully someone can shed a little
light on this:

=20

spider.conf:

=20

@servers =3D ({

    base_url =3D> 'http://dev.site.org',

    use_default_config =3D> 1,

    email =3D> 'cday@mindshare.net',

    test_url =3D> \&test_url

});

=20

sub test_url {

        my ($uri, $server) =3D @_ ;

        return if $uri->query =3D~ /PHPSESSID/;

}

=20

1;

=20

$ swish-e -c swish.conf -v 3 -S prog

Parsing config file 'swish.conf'

Indexing Data Source: "External-Program"

Indexing "/usr/local/lib/swish-e/spider.pl"

External Program found: /usr/local/lib/swish-e/spider.pl

/usr/local/lib/swish-e/spider.pl: Reading parameters from
'/usr/local/apache/htdocs/dev/components/com_swishesearch/spider.conf'

Use of uninitialized value in pattern match (m//) at
/usr/local/apache/htdocs/dev/components/com_swishesearch/spider.conf
line 11.

=20

Summary for: http://dev.site.org

Skipped: 1  (1.0/sec)

=20

Removing very common words...

no words removed.

Writing main index...

err: No unique words indexed!

=20

Is there some sort of syntax with my test_url bit I'm missing?  I'm
trying to ensure the same pages aren't indexed repeatedly due to
changing PHPSESSID variables when spidering the site.

=20

Any help is greatly appreciated.

=20

Thanks,

Chad Day




*********************************************************************
Due to deletion of content types excluded from this list by policy,
this multipart message was reduced to a single part, and from there
to a plain text message.
*********************************************************************
Received on Wed Jan 4 07:55:15 2006