As you are using the spider, why not use a spider configuration file with a
test_url subroutine. See the online documentation on the spider.pl. This way
you can at least skip the files with "dsc_". I don't know how to avoid
directories containing .htaccess.
In your swish-e config:
SwishProgParameters spider.conf
In your spider.conf:
my %mySite = (
use_default_config => 1,
base_url => 'http://nottherealsitename.com/',
test_url => sub {
my $uri = shift;
return 0 if $uri->path =~ /\/dsc_/;
return 1;
}
);
@servers = ( \%mySite );
Good luck!
users-bounces@lists.swish-e.org wrote on 24/05/2007 13:33:10:
> OK, let's start over. . .
>
> I want to index the site.
> Only .htm and .html
> I don't want to index directories containing .htaccess
> I don't want to index documents beginning with "dsc_" )
>
> --
> Swish-e version: 2.4.5
> OS: RH9
> Current run string: swish-e -S prog -c swish.conf
>
> Current swish.conf:
>
> # Swish-e config
> #
> IndexDir spider.pl
> IndexFile index.swish-e
>
> SwishProgParameters default http://nottherealsitename.com/
>
> IndexReport 3
>
> Metanames swishtitle swishdocpath
>
> IndexOnly .htm .html
>
> IgnoreWords File: /usr/local/swish-e-2.4.5/conf/stopwords/english.txt
>
> StoreDescription TXT* 10000
> StoreDescription HTML* <body> 10000
>
>
> Need some help.
>
>
> Bill Moseley wrote:
> > On Wed, May 23, 2007 at 10:35:47PM -0400, Frank Hunt wrote:
> >> this fails:
> >>
> >> IndexDir spider.pl
> >> SwishProgParameters default http://website.com/
> >> FileRules directory contains ^\.htaccess
> >>
> >> run string: swish-e -S prog -c swish.conf2
> >
> > -S prog means you are not reading from the file system -- FileRules is
> > only for reading from the file system.
> >
> >
> >
> >
>
> --
> frank hunt
> PLUG member-in-absentia
> confused linux admin
> part time windows(r) washer
> rochester hills, mi
> _______________________________________________
> Users mailing list
> Users@lists.swish-e.org
> http://lists.swish-e.org/listinfo/users
_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Thu May 24 08:25:29 2007