Skip to main content.
home | support | download

Back to List Archive

Re: Indexing takes forever

From: Peter Karman <peter(at)not-real.peknet.com>
Date: Fri May 06 2005 - 21:05:58 GMT
Nick scribbled on 5/6/05 3:49 PM:
> swish-e -c /etc/swish.conf -S prog -i DirTree.pl
> I tried that but I got this:
> 
> Indexing Data Source: "External-Program"
> Indexing "DirTree.pl"
> External Program found: /usr/lib/swish-e/DirTree.pl
> Must supply at least one directory
> Usage:
>     DirTree.pl [options] directory <directory...> | swish-e -S prog -i stdin
> 
>       Options:
>         -verbose        Display processing info
>         -debug          Enable debugging (including SWISH::Filter debugging)
>         -man            Display documentation
>         -path           Display location lib path set at installation
>         -no_skip        Process documents even if filtering fails
>         -symlinks       Follow symbolic links.  Default is to NOT follow
> symlinks
> 
> Removing very common words...
> no words removed.
> Writing main index...
> err: No unique words indexed!

try adding this line to your existing config:

SwishProgParameters /home/shared

and comment out this line:

# IndexDir "/home/shared"



> Is there any reason to use SWISH::Filter for performance, or is it just
> supposed to be easier?  To me doing something like this in the config file
> makes more sense, as I understand what it is doing when I tell it about
> each type of file:
> 

I think you're right, in principle. You must be a sysadmin-type: we tend not to 
like the black box approach. ;)

SWISH::Filter lets you drop in new filters and, in theory, not change your 
config. But doing it longhand like you have it should work too. Unless it doesn't...


> IndexContents TXT* .txt
> IndexContents HTML* .htm
> IndexContents HTML* .html
> 
> FileFilter .pdf pdftotext "'%p' -"
> IndexContents TXT* .pdf
> 
> FileFilter .doc catdoc
> IndexContents TXT* .doc
> 
> FileFilter .ppt ppthtml
> IndexContents TXT* .ppt
> 
> 
> But of course I have something wrong in there since I am getting lots of
> errors from catdoc, and also I don't know how to put the excel one in
> there since I think it is a perl script.
> 


-- 
Peter Karman  .  http://peknet.com/  .  peter(at)not-real.peknet.com
Received on Fri May 6 14:05:59 2005