Re: [swish-e] XML parsing not returning Title

From: Peter Karman <peter(at)>
Date: Tue Dec 04 2007 - 17:48:10 GMT
On 12/03/2007 07:41 PM, Robinson Craig wrote:

> Nevertheless, my question still stands: is there a "standard" way of
> indexing PDF content and metadata?

I don't know about standard. I recommend SWISH::Filter with and the
-S prog method, over the FileFilter directive, just because once you start using as your aggregators, you (1) gain a lot of more flexibility with
respect to filtering, skipping files, etc., and (2) can add more filters transparently by
just dropping new .pm files into the @INC path.


NOTE that SWISH::Filter still uses xpdf tools under the hood, so in the case of PDF
specifically it might be 6/half-dozen. But I prefer to start habits that leave me more
options in the longer term.

NOTE too that Swish3 will likely not have FileFilter, but instead will use SWISH::Filter
from the start.

