Skip to main content.
home | support | download

Back to List Archive

Re: Antw: [SWISH-E:424] Re: ndexing PDF

From: Paul J. Lucas <pjl(at)not-real.ptolemy.arc.nasa.gov>
Date: Tue Aug 11 1998 - 22:55:57 GMT
On Tue, 11 Aug 1998, Rainer Scherg RTC wrote:

> I'm using a very simple filter prog to index Winword Docs on our servers:
> cat $1 | strings

	The extraction code I wrote as part of SWISH++ is just
	basically a more sophisitcated version of strings: strings
	prints any run of 3 (?) or more ASCII characters; SWISH++'s
	"extract" discards ASCII sequences representing hexadecimal
	numbers (for embedded images), PostScript keywords (for
	embedded EPSF), and a few other things, so it should always do
	a better job.

> BTW: Some words to swish++
>   So far I've only read the Readme file (some weeks ago).
>   But for my personal flavor swish++ is lacking some features I want to
>   see (e.g. Config Files) at this moment.

	The README explains why config files were intentionally not
	implemented.

	- Paul
Received on Tue Aug 11 16:05:34 1998