Re: search only .html and no extension files

From: Bill Moseley <moseley(at)>
Date: Wed Nov 09 2005 - 04:44:35 GMT
On Tue, Nov 08, 2005 at 12:49:31PM -0800, Michael Porcaro wrote:
> Hi,
> Question 1:  
> Lets say I add a new page.  Do I have to spider the whole site again to
> index the 1 page?

Mostly, yes.

> Question 2:
> I finally was able to spider my site, and get the search engine to work.
> One problem now:
> The spider indexed every single link when I instructed it to index .html
> by using this config file called swish.conf
> # Use for indexing 
> IndexDir
> IndexOnly .html

IndexOnly isn't used when using -S prog input method (i.e. using

> It took about 7 hours to spider the whole site with this command:
> Swish-e -e -S prog -c swish.conf
> There are a lot of useless links in the index file which is 80 megs.
> How can I filter out every page except .html?  How come it didn't obey
> the config file? should cover most of that.

Bill Moseley

