On Wed, Jun 09, 2004 at 02:15:56PM -0700, Peter Karman wrote:
> I believe a simple FileFilter config line will work, though it is slower
> than the SWISH::Filter module (Bill, correct me on this):
>
> FileFilter .pdf pdftotext "'%p' -"
Only if not using spider.pl's default config. The default config in
spider.pl automatically filters pdf files (if xpdf programs are found in
the path).
By default I mean passing "default <url>" to spider.pl -- the "default"
tells the spider to use a built-in config. Look at spider.pl in an
editor to see that config -- and how it uses SWISH::Filter.
Otherwise, if you don't pass a parameter to spider.pl it will look for
SwishSpiderConfig.pl (IIRC). The example SwishSpiderConfig.pl file also
has examples of how to use SWISH::Filter.
Basically, you default a content filter in spider.pl that passes the
content and the content-type to SWISH::Filter.
That make sense?
--
Bill Moseley
moseley@hank.org
Received on Thu Jun 10 00:03:15 2004