Skip to main content.
home | support | download

Back to List Archive

search only .html and no extension files

From: Michael Porcaro <music(at)not-real.recordhall.com>
Date: Tue Nov 08 2005 - 20:50:40 GMT
Hi,

Question 1:  
Lets say I add a new page.  Do I have to spider the whole site again to
index the 1 page?

Question 2:
I finally was able to spider my site, and get the search engine to work.
One problem now:

The spider indexed every single link when I instructed it to index .html
by using this config file called swish.conf

# Use spider.pl for indexing 
IndexDir spider.pl
IndexOnly .html

It took about 7 hours to spider the whole site with this command:

Swish-e -e -S prog -c swish.conf

There are a lot of useless links in the index file which is 80 megs.
How can I filter out every page except .html?  How come it didn't obey
the config file?
Received on Tue Nov 8 12:50:43 2005