On Tue, Nov 08, 2005 at 12:49:31PM -0800, Michael Porcaro wrote:
> Hi,
>
> Question 1:
> Lets say I add a new page. Do I have to spider the whole site again to
> index the 1 page?
Mostly, yes.
>
> Question 2:
> I finally was able to spider my site, and get the search engine to work.
> One problem now:
>
> The spider indexed every single link when I instructed it to index .html
> by using this config file called swish.conf
>
> # Use spider.pl for indexing
> IndexDir spider.pl
> IndexOnly .html
IndexOnly isn't used when using -S prog input method (i.e. using
spider.pl).
>
> It took about 7 hours to spider the whole site with this command:
>
> Swish-e -e -S prog -c swish.conf
>
> There are a lot of useless links in the index file which is 80 megs.
> How can I filter out every page except .html? How come it didn't obey
> the config file?
http://swish-e.org/docs/spider.html should cover most of that.
--
Bill Moseley
moseley@hank.org
Unsubscribe from or help with the swish-e list:
http://swish-e.org/Discussion/
Help with Swish-e:
http://swish-e.org/current/docs
swish-e@sunsite.berkeley.edu
Received on Tue Nov 8 20:44:39 2005