On 12/03/2007 07:41 PM, Robinson Craig wrote:
> Nevertheless, my question still stands: is there a "standard" way of
> indexing PDF content and metadata?
>
I don't know about standard. I recommend SWISH::Filter with spider.pl/DirTree.pl and the
-S prog method, over the FileFilter directive, just because once you start using
DirTree.pl/spider.pl as your aggregators, you (1) gain a lot of more flexibility with
respect to filtering, skipping files, etc., and (2) can add more filters transparently by
just dropping new .pm files into the @INC path.
See http://swish-e.org/docs/swish-config.html#filtering_with_swish_filter
NOTE that SWISH::Filter still uses xpdf tools under the hood, so in the case of PDF
specifically it might be 6/half-dozen. But I prefer to start habits that leave me more
options in the longer term.
NOTE too that Swish3 will likely not have FileFilter, but instead will use SWISH::Filter
from the start.
--
Peter Karman . peter(at)not-real.peknet.com . http://peknet.com/
_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Tue Dec 4 12:48:11 2007