Skip to main content.
home | support | download

Back to List Archive

Good Excel parser

From: Roubart Capcap <RCapcap(at)not-real.scif.com>
Date: Wed May 28 2003 - 16:00:32 GMT
Hello,

Does anybody know of a good Excel parser?  I tried the Swish Filters with the following code in my spider.pl:

use lib '/swish-e-2.2.3/filters/SWISH/Filters';
use XLtoHTML;
sub xl {
   my ( $uri, $server, $response, $content_ref ) = @_;
   return 1 unless $response->content_type eq 'application/vnd.ms-excel';
   # for logging counts
   $server->{counts}{'XLS transformed'}++;
   $$content_ref = ${XLtoHTML( $content_ref )};
   $$content_ref =~ tr/ / /s;
   return 1;
}

I tried the above but most of the Excel documents were not indexed.

Roubart Capcap
Received on Wed May 28 16:00:44 2003