Skip to main content.
home | support | download

Back to List Archive

error indexing pdf files

From: Jody Cleveland <Cleveland(at)not-real.mail.winnefox.org>
Date: Tue Apr 15 2003 - 14:10:39 GMT
> And if that's not enough simply add some print statements to 
> your test_url
> function.
> 
>     test_url => sub {
>         my ($uri, $server) = @_;
>         print STDERR "checking path: ", $uri->path, \n" 
>             if $server->{debug}&DEBUG_INFO
>         return if $uri->path =~ /\.(gif|jpeg)$/;
>         return $uri->path =~ m[^/citydirs/];
>     },

Here's a chunk of what the output looks like:
>> +Fetched 10 Cnt: 1982
http://www.oshkoshpubliclibrary.org/../../../../../../p
ages/internetguides/family.html 200 OK text/html 14975
parent:http://www.oshkosh
publiclibrary.org/../../../../../../pages/internetguides/guide_index.html

> Another way to do all this is index the entire site in one go and use
> Swish-e's ExtractPath to set a metaname.  Then when searching you can
> limit to areas of the index.  See the "select_by_meta" example in the
> swish.cgi file.

This sounds like exactly what I'm looking for. I looked at the sample in
swish.cgi:
        Xselect_by_meta  => {
            method      => 'checkbox_group',
            columns     => 3,
            metaname    => 'site',     # Can't be a metaname used elsewhere!
            values      => [qw/misc mod vhosts other/],
            labels  => {
                misc    => 'General Apache docs',
                mod     => 'Apache Modules',
                vhosts  => 'Virutal hosts',
            },
            description => 'Limit search to these areas: ',
        },

Are the values the individual directories? Would I have values =>
[qw/citydirs etc/],? How do I activate that function? Also, for the
ExtractPath, where does that go?

> BTW -- are you using keep_alive => 1 when spidering?
yes

Sorry for all the questions!

Jody Cleveland
Received on Tue Apr 15 14:14:38 2003