Skip to main content.
home | support | download

Back to List Archive

Re: Trouble filtering xls with spider.pl

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Wed Oct 22 2003 - 19:54:23 GMT
On Wed, Oct 22, 2003 at 11:39:04AM -0700, Bruce Pettyjohn wrote:

> The problem comes when trying to start with an html index and crawl through 
> the document
> list containing the Excel files with this command:
> 
>         /usr/local/lib/swish-e/spider.pl default 
> http://www.varianinc.com/epindex.htm
> 
> All of the docs are found.  Only the Word docs are filtered.
> 
> Again this works for the individual file:
> 
>         /usr/local/lib/swish-e/spider.pl default 
> http://www.varianinc.com/test.xls

Well, then that's a bug.  Actually, there were two bugs -- I had moved a Makefile
and the filter was getting installed in the wrong location.

I'm sure glad you caught that.  I wrote my test in the "right" order so 
that didn't show up.

Sorry for the trouble.  And that the xls filter sure is slow!

I just created a new daily snapshot with the fix:

  http://swish-e.org/dev/swish-daily/swish-e-2.4.0-pr4-2003-10-22.tar.gz


But the fix is not hard if you don't want to reinstall.

in the convert subroutine in Filter.pm:

+    my $done;
     for my $filter ( @filter_set )  {
+        if ( $done ) {
+            push @cur_filters, $filter;
+            next;
+        }
+


             # All done?
-            last unless $doc_object->continue( 0 );
+            $done++ unless $doc_object->continue( 0 );
         }


-- 
Bill Moseley
moseley@hank.org
Received on Wed Oct 22 19:54:38 2003