Hello,
Does anybody know of a good Excel parser? I tried the Swish Filters with the following code in my spider.pl:
use lib '/swish-e-2.2.3/filters/SWISH/Filters';
use XLtoHTML;
sub xl {
my ( $uri, $server, $response, $content_ref ) = @_;
return 1 unless $response->content_type eq 'application/vnd.ms-excel';
# for logging counts
$server->{counts}{'XLS transformed'}++;
$$content_ref = ${XLtoHTML( $content_ref )};
$$content_ref =~ tr/ / /s;
return 1;
}
I tried the above but most of the Excel documents were not indexed.
Roubart Capcap
Received on Wed May 28 16:00:44 2003