Skip to main content.
home | support | download

Back to List Archive

Re: Indexing remote web sites with SWISH++

From: Paul J. Lucas <pjl(at)not-real.ptolemy.arc.nasa.gov>
Date: Wed Dec 30 1998 - 00:49:44 GMT
On Mon, 28 Dec 1998, I wrote:

> 	If local filesystem space is an issue, i.e., you don't want to
> 	copy an entire other web site to your local filesystem as you
> 	index it, I'm sure it would be possible to write a slightly
> 	more complicated Perl script that would delete the files after
> 	they are indexed as the get/index cycle progresses.  You'd
> 	probably en up doing something using the IPC::Open2 Perl module
> 	(see the Perl 5 "Camel" book, p. 344): open a bidirectional
> 	pipe to index with the -v3 option so the script could tell when
> 	file has been indexed so the file could be deleted safely.

	I've done just that by creating an httpindex command.  You can
	tell it to do nothing with the copied files, delete them as
	indexing progresses (as described above) or replace them with
	their descriptions extracted via the extract_description()
	function in my WWW.pm Perl module.

	Hence, to index files on remote servers, the functionality was
	added "externally" to SWISH++ and none of the C++ code had to be
	modified.

	I've put up SWISH++ 1.5b1:

	ftp://shell3.ba.best.com/pub/pjl/software/swish++-1.5b1.tar.gz

	There is also a text and PDF man-page for httpindex(1).
	Feedback appreciated.

	- Paul
Received on Tue Dec 29 16:50:00 1998