Skip to main content.
home | support | download

Back to List Archive

Re: [swish-e] swish.conf problems - was ignorewords wildcard?

From: <Rene.Kloos(at)not-real.esa.int>
Date: Thu May 24 2007 - 12:58:31 GMT
Add the credential_timeout to your spider conf (see:
http://swish-e.org/docs/spider.html). The default is 30 seconds, so that
might be a bit too long.

credential_timeout      =>    undef,

René


users-bounces@lists.swish-e.org wrote on 24/05/2007 14:51:59:

> Yeah, it blocks and times out.  Bypassing the .htaccess will speed
> things up.  Not a really big deal, merely a tweak.  Now, of course, it
> has become a challenge.
>
> Rene.Kloos@esa.int wrote:
> > BTW, if using the spider, won't that simply get blocked when coming
across a
> > directory with .htaccess? After all I suppose that's what the .htaccess
is
> > for, to set up some form of access control. You can provide the spider
with
> > the appropriate credentials to get in, but if that's not what you want,
then
> > things should be fine. Or is that too simplistic :-)
> >
> > Bye,
> > René
> >
> > users-bounces@lists.swish-e.org wrote on 24/05/2007 13:33:10:
> >
> >> OK, let's start over. . .
> >>
> >> I want to index the site.
> >> Only .htm and .html
> >> I don't want to index directories containing .htaccess
> >> I don't want to index documents beginning with "dsc_" )
> >>
> >> --
> >> Swish-e version:  2.4.5
> >> OS:  RH9
> >> Current run string:  swish-e -S prog -c swish.conf
> >>
> >> Current swish.conf:
> >>
> >> # Swish-e config
> >> #
> >> IndexDir spider.pl
> >> IndexFile index.swish-e
> >>
> >> SwishProgParameters default http://nottherealsitename.com/
> >>
> >> IndexReport 3
> >>
> >> Metanames swishtitle swishdocpath
> >>
> >> IndexOnly .htm .html
> >>
> >> IgnoreWords File: /usr/local/swish-e-2.4.5/conf/stopwords/english.txt
> >>
> >> StoreDescription TXT* 10000
> >> StoreDescription HTML* <body> 10000
> >>
> >>
> >> Need some help.
> >>
> >>
> >> Bill Moseley wrote:
> >>> On Wed, May 23, 2007 at 10:35:47PM -0400, Frank Hunt wrote:
> >>>> this fails:
> >>>>
> >>>> IndexDir spider.pl
> >>>> SwishProgParameters default http://website.com/
> >>>> FileRules directory contains ^\.htaccess
> >>>>
> >>>> run string:  swish-e -S prog -c swish.conf2
> >>> -S prog means you are not reading from the file system -- FileRules is
> >>> only for reading from the file system.
> >>>
> >>>
> >>>
> >>>
> >> --
> >> frank hunt
> >> PLUG member-in-absentia
> >> confused linux admin
> >> part time windows(r) washer
> >> rochester hills, mi
> >> _______________________________________________
> >> Users mailing list
> >> Users@lists.swish-e.org
> >> http://lists.swish-e.org/listinfo/users
> >
> > _______________________________________________
> > Users mailing list
> > Users@lists.swish-e.org
> > http://lists.swish-e.org/listinfo/users
> >
>
> --
> frank hunt
> PLUG member-in-absentia
> confused linux admin
> part time windows(r) washer
> rochester hills, mi
> _______________________________________________
> Users mailing list
> Users@lists.swish-e.org
> http://lists.swish-e.org/listinfo/users

_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Thu May 24 08:58:35 2007