Skip to main content.
home | support | download

Back to List Archive

Re: search only .html and no extension files

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Wed Nov 09 2005 - 04:44:35 GMT
On Tue, Nov 08, 2005 at 12:49:31PM -0800, Michael Porcaro wrote:
> Hi,
> 
> Question 1:  
> Lets say I add a new page.  Do I have to spider the whole site again to
> index the 1 page?

Mostly, yes.


> 
> Question 2:
> I finally was able to spider my site, and get the search engine to work.
> One problem now:
> 
> The spider indexed every single link when I instructed it to index .html
> by using this config file called swish.conf
> 
> # Use spider.pl for indexing 
> IndexDir spider.pl
> IndexOnly .html

IndexOnly isn't used when using -S prog input method (i.e. using
spider.pl).


> 
> It took about 7 hours to spider the whole site with this command:
> 
> Swish-e -e -S prog -c swish.conf
> 
> There are a lot of useless links in the index file which is 80 megs.
> How can I filter out every page except .html?  How come it didn't obey
> the config file?

http://swish-e.org/docs/spider.html should cover most of that.

-- 
Bill Moseley
moseley@hank.org

Unsubscribe from or help with the swish-e list: 
   http://swish-e.org/Discussion/

Help with Swish-e:
   http://swish-e.org/current/docs
   swish-e@sunsite.berkeley.edu
Received on Tue Nov 8 20:44:39 2005