Christoph Lechner wrote:
> swish-filter-test still dumps the textual contents of the file.
When I run
/usr/local/bin/swish-filter-test --content --verbose --depreciated
ATmega16.pdf
I get exactly the same error message as from swish_filter.pl:
** /usr/local/bin/swish-filter-test:
Can't locate object method "filter" via package "SWISH::Filter" at
/usr/local/bin/swish-filter-test line 178.
without the --depreciated option the contents is printed to stdout.
So swish_filter uses the old, depreciated interface. I think this should
be considered a bug.
Therefore I modified the swish_filter program to fit the new interface.
>From the docs of spider.pl I see that the spider tries to filter PDF
files, if the filter tools are installed. That's no good, as pdftotext
get some trash as input but not a PDF file.
My swish.conf is:
--> swish.conf <--
IndexDir spider.pl
SwishProgParameters spider.conf
FileFilter .pdf ./swish_filter.pl "%p %P"
IndexContents HTML .pdf
StoreDescription HTML* <body>
IndexReport 2
# Allow extra searching by title, path
Metanames swishtitle swishdocpat
--> end <--
My spider.conf is:
--> spider.conf <--
my %kb_site = (
base_url => 'http://kb/kb/tb/',
max_size => 100000000,
ignore_robots_file => 1
);
@servers = ( \%kb_site );
1;
--> end <--
Please find the modified and working swish_filter.pl tool in the
attachments.
- cl
_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Sat Sep 5 08:22:44 2009