Thanks for the pointer. I knew that -S http was depreciated. I switched to the -S prog and was able to use robots.txt to exclude the directories causing the problem. A really simple solution that a small amount of documentation (of the spider.pl) on the site would have made clear.
david_42@hughes.net wrote on 9/26/08 6:02 PM: > Greetings, > > I have read ever scrap of documentation on the site. When using -S fs you can > use FileRules to exclude directories. I cannot find any way of doing this with > either -S http or -S prog. >
first, don't use -S http. It is deprecated.
second, with -S prog and spider.pl, check the spider.pl docs for the test_uri callback feature.
-- Peter Karman . http://peknet.com/ . peter@peknet.com _______________________________________________ Users mailing list Users@lists.swish-e.org http://lists.swish-e.org/listinfo/users