Zhou Xiang wrote on 03/26/2009 03:29 PM:
> Hi David,
>
> Thank you for your reply!
> I tested it again today. It shows that the crawler can only index the
> webpages within "http://digital.lib.lehigh.edu". It cannot crawl the pages
> on "rust.cc.lib.lehigh.edu" or any other websites, even though i used real
> URLs instead of queries.
> Any ideas about it?
don't use the old spider.
Use spider.pl instead with -S prog.
See this documentation:
http://swish-e.org/docs/spider.html
and
http://swish-e.org/docs/swish-faq.html#spidering
Note that with spider.pl there are 2 config files: 1 for swish-e, and 1
for spider.pl.
Your swish-e config file can remain unchanged with the exception of
dropping:
MaxDepth 2
TmpDir /usr/local/swish-e-2.4.5/tmp
since those are ignored with the -S prog method.
--
Peter Karman . peter(at)not-real.peknet.com . http://peknet.com/
_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Thu Mar 26 17:23:27 2009