Skip to main content.
home | support | download

Back to List Archive

Re: Filters on WinNT

From: Bill Moseley <moseley(at)>
Date: Tue Nov 06 2001 - 13:53:10 GMT
At 04:23 AM 11/6/2001 -0800, Klaus Hollenbach wrote:

>FilterDir C:/path/to/perl/script
>FileFilter .pdf
>IndexDir C:/test/swish

(I tend to just use a full path in the FileFilter and not use FilterDir.)

So, can you use !# in Win32?  Or do you have to say:

FilterDir .pdf "perl %p"

>--- perl script begin ---
>$Program= "path/to/program/pdftotext.exe";
># remove single quotes form parameter     (1)
>$Input = $ARGV[0];
>$Input =~ s/\'//g;

Don't need to do that now.  For debugging I'd do:

print STDERR "Input file:'$Input'\n";

>(Swish passes the filname to the associated program/script in single  )
>(quotes which gets misinterpreted by pdftotext. Unfortunately I       )
>(couldn't change the default values of the FileFilter-Directive using )
>(something like                                                       )
>(---                                                                  )
>(FileFilter .pdf pdftotext.exe "%p -"                                 )

OH, so all your perl program is doing is calling pdftotext?  I hope the
documentation is somewhat clear that calling a perl program just to run a
program will really slow down indexing.  Just call the program.

That looks like the right command, but maybe pdftotext isn't in your path?

>(this produces "err: FileFilter requires two values"                  )

You need to upgrade swish, as my version doesn't say that.  

This is on linux:

> cat c
FileFilter .pdf pdftotext "%p -"

> ./swish-e -c c -i /usr/X11R6/lib/X11/xfig/xfig.pdf
Indexing Data Source: "File-System"
Indexing "/usr/X11R6/lib/X11/xfig/xfig.pdf"
Removing very common words...
no words removed.
Writing main index...
Sorting words ...
Sorting 2195 words alphabetically
Writing header ...
Writing index entries ...
  Writing word text: Complete
  Writing word hash: Complete
  Writing word data: Complete
2195 unique words indexed.
4 properties sorted.                                              
1 file indexed.  169502 total bytes.
Elapsed time: 00:00:01 CPU time: 00:00:00
Indexing done!

Bill Moseley
Received on Tue Nov 6 13:53:49 2001