Skip to main content.
home | support | download

Back to List Archive

Re: Excel parser again

From: <moseley(at)not-real.hank.org>
Date: Thu Aug 28 2003 - 15:50:58 GMT
On Thu, Aug 28, 2003 at 01:04:45AM -0700, Bucharow Leonard wrote:
> Hi Bill,
> 
> thanks for the note, that there is an example in SpiderConfig.pl for using
> Swish::Filter
> (unfortunately not for using Swish::Filters::XLtoHTML)!

I think you missed what I wrote about.  

There will never be an example of using SWISH::Filters::XLtoHTML because
the module is not meant to be used by itself.


> I'm trying to test Swish::Filter with Shell again. Now I get debugging
> messages
> (I've set $testing to 1; in Filter.pm).
> 
> > # perl Filter.pm test /usr/home/swish-e/test/excel.xls
> > Testing mode for Filter.pm
> >
> > =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> > No SWISH filters found
> > 1550 Error- /usr/home/swish-e/test/excel.xls: Can't use an undefined value
> as an ARRAY reference at Filter.pm line 341.
> 
> I mean I've set the path:
> > # export PERL5LIB='/usr/local/swish-e/lib/SWISH/Filters'
> 
> Can you help me with this error please?

After the last question about this I added (now in CVS and swish-daily 
versions) a program called swish-filter-test which makes all that 
testing much easier.

> # export PERL5LIB='/usr/local/swish-e/lib/SWISH/Filters'

Does that "#" mean you are running as root?  You don't need root -- and 
should not use it for this.

Second, PERL5LIB is a path that's prepended to the module name.
With your PERL5LIB setting above it's trying to load the filter at:

  /usr/local/swish-e/lib/SWISH/Filters/SWISH/Filters/XLtoHTML.pm

What you need is

   PERL5LIB=/usr/home/swish-e/lib

if that is where the modules really are.

Again, all that's much easier in the current swish-daily version -- you 
don't need to set any paths as it's done automatically at installation 
time.

> Other question:
> In Filter.pm in %mime_types is originally no Mime Type for excel:
> 
> # Here's some common mime types
> my %mime_types = (
>     doc   => 'application/msword',
>     pdf   => 'application/pdf',
>     html  => 'text/html',
>     htm   => 'text/html',
>     txt   => 'text/plain',
>     text  => 'text/plain',
>     xml   => 'text/xml',
>     mp3   => 'audio/mpeg',
> );
> 
> Should I add a mime type for xcel files?

I suppose it's a good idea -- but not really necessary.  When run from 
spider.pl the web server is sending the content type.  If filtering 
files (-S fs) then SWISH::Filter uses a Perl module to get the 
mime type -- but if that module is not installed for some reason that 
%mime_types hash above is used as a fallback.  So normally it's not 
used.




Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
You have mail.
Last login: Wed Aug 27 21:33:29 2003 from laptop

moseley@bumby:~$ swish-filter-test test.pdf
** /usr/local/bin/swish-filter-test:
  Failed to open 'test.pdf': No such file or directory


moseley@bumby:~$ swish-filter-test apache/test.pdf

Document apache/test.pdf was  filtered.
   Document:     apache/test.pdf
   Content-Type: text/html  (initial was application/pdf)
   Parser type:  HTML*


moseley@bumby:~$ swish-filter-test                
Must specify a file or URL
Usage:
    swish-filter-test [options] <file or url> <...>

     Options:
       -quiet           don't generate messages to stderr
       -content         output content to stderr
       -(no)skip_binary skip output of binary files (default)
       -lines <num>     Number of lines of content to display to stderr if verbose
       -headers         output with headers for swish-e -S prog method 
       -help            brief help message
       -man             full documentation

moseley@bumby:~$ 


-- 
Bill Moseley
moseley@hank.org
Received on Thu Aug 28 15:51:49 2003