> And if that's not enough simply add some print statements to
> your test_url
> function.
>
> test_url => sub {
> my ($uri, $server) = @_;
> print STDERR "checking path: ", $uri->path, \n"
> if $server->{debug}&DEBUG_INFO
> return if $uri->path =~ /\.(gif|jpeg)$/;
> return $uri->path =~ m[^/citydirs/];
> },
Here's a chunk of what the output looks like:
>> +Fetched 10 Cnt: 1982
http://www.oshkoshpubliclibrary.org/../../../../../../p
ages/internetguides/family.html 200 OK text/html 14975
parent:http://www.oshkosh
publiclibrary.org/../../../../../../pages/internetguides/guide_index.html
> Another way to do all this is index the entire site in one go and use
> Swish-e's ExtractPath to set a metaname. Then when searching you can
> limit to areas of the index. See the "select_by_meta" example in the
> swish.cgi file.
This sounds like exactly what I'm looking for. I looked at the sample in
swish.cgi:
Xselect_by_meta => {
method => 'checkbox_group',
columns => 3,
metaname => 'site', # Can't be a metaname used elsewhere!
values => [qw/misc mod vhosts other/],
labels => {
misc => 'General Apache docs',
mod => 'Apache Modules',
vhosts => 'Virutal hosts',
},
description => 'Limit search to these areas: ',
},
Are the values the individual directories? Would I have values =>
[qw/citydirs etc/],? How do I activate that function? Also, for the
ExtractPath, where does that go?
> BTW -- are you using keep_alive => 1 when spidering?
yes
Sorry for all the questions!
Jody Cleveland
Received on Tue Apr 15 14:14:38 2003