Re: error indexing pdf files

From: Bill Moseley <moseley(at)>
Date: Tue Apr 15 2003 - 13:27:30 GMT
On Tue, 15 Apr 2003, Jody Cleveland wrote:

> My question is, how do I get the spider to only look at a specific folder,
> and nothing else? I looked through the swish-e message archive, and came
> across this, which I added to my
> But, that still indexes all of All I want is
> the citydirs directory.

You can try setting   


And if that's not enough simply add some print statements to your test_url

    test_url => sub {
        my ($uri, $server) = @_;
        print STDERR "checking path: ", $uri->path, \n" 
            if $server->{debug}&DEBUG_INFO
        return if $uri->path =~ /\.(gif|jpeg)$/;
        return $uri->path =~ m[^/citydirs/];

Another way to do all this is index the entire site in one go and use
Swish-e's ExtractPath to set a metaname.  Then when searching you can
limit to areas of the index.  See the "select_by_meta" example in the
swish.cgi file.

BTW -- are you using keep_alive => 1 when spidering?

Bill Moseley
