Skip to main content.
home | support | download

Back to List Archive

scope of indexing with spider.pl

From: Shen Yang <Shen.Yang(at)not-real.ny.frb.org>
Date: Tue Oct 29 2002 - 21:11:45 GMT
I finished installing swish-e 2.2.1 and verified that it is working 
properly by indexing and searching some sample html pages with spider.pl 
via the prog method.
Now that I am ready to index my site, a question occured to me: how 
spider.pl knows when to stop crawling? Does the spider only index pages 
of a given server and/or domain or does the spider.pl follow all the 
links that it encounters, including links to sites in other servers 
and/or domains? For instance, if  my site in the domain ny.frb.org has 
links to pages on www.firstgov.org, does that mean that the spider.pl 
will also index pages in first.gov domain? If yes, how one can limit the 
spider.pl to only index pages of a certain domain and ignore all pages 
of other domains?

-- 
Shen C. Yang
Information Technology Specialist

Federal Reserve Bank of New York - www.newyorkfed.org
Technology Support Division
Internal Communications and Multimedia Services
tel: (212) 720 2857
e-mail: shen.yang@ny.frb.org

Any comments or statements made in this transmission reflect the views of the sender and are not necessarily the views of the Federal Reserve Bank of New York.
Received on Tue Oct 29 21:15:55 2002