Skip to main content.
home | support | download

Back to List Archive

Re: Indexing takes forever

From: Nick <newsgroups(at)not-real.2thebatcave.com>
Date: Fri May 06 2005 - 21:13:45 GMT
I tried like you said and now I am getting some of these:

22865 Warning - /home/shared/Accounting/Capital/Update Capital 7-7-04.xls:
Character in 'c' format wrapped in pack at
/usr/lib/perl5/vendor_perl/5.8.6/Spreadsheet/ParseExcel.pm line 1790.
Error: Bad annotation action
Failed to set content type for document
'/home/shared/Environmental_Community/Environmental/Awards/Independence
Examiner Playground Article 12-13-02.mht'
Bad BBD entry!
Broken OLE file. Try using -b switchFailed to set content type for
document
'/home/shared/Environmental_Community/Environmental/Training/Thumbs.db'

Do those matter?

Also does the default SWISH::Filter install know about powerpoint files
too?  I looked in /usr/lib/swish-e/perl/SWISH/Filters but I only see files
that seem to reference ms word, ms excel, pdf, and mp3.  I see that ms
powerpoint is advertised on your web page as being supported, but there
doesn't seem to be much mention of it.


>
> Nick scribbled on 5/6/05 3:49 PM:
>> swish-e -c /etc/swish.conf -S prog -i DirTree.pl
>> I tried that but I got this:
>>
>> Indexing Data Source: "External-Program"
>> Indexing "DirTree.pl"
>> External Program found: /usr/lib/swish-e/DirTree.pl
>> Must supply at least one directory
>> Usage:
>>     DirTree.pl [options] directory <directory...> | swish-e -S prog -i
>> stdin
>>
>>       Options:
>>         -verbose        Display processing info
>>         -debug          Enable debugging (including SWISH::Filter
>> debugging)
>>         -man            Display documentation
>>         -path           Display location lib path set at installation
>>         -no_skip        Process documents even if filtering fails
>>         -symlinks       Follow symbolic links.  Default is to NOT follow
>> symlinks
>>
>> Removing very common words...
>> no words removed.
>> Writing main index...
>> err: No unique words indexed!
>
> try adding this line to your existing config:
>
> SwishProgParameters /home/shared
>
> and comment out this line:
>
> # IndexDir "/home/shared"
>
>
>
>> Is there any reason to use SWISH::Filter for performance, or is it just
>> supposed to be easier?  To me doing something like this in the config
>> file
>> makes more sense, as I understand what it is doing when I tell it about
>> each type of file:
>>
>
> I think you're right, in principle. You must be a sysadmin-type: we tend
> not to
> like the black box approach. ;)
>
> SWISH::Filter lets you drop in new filters and, in theory, not change your
> config. But doing it longhand like you have it should work too. Unless it
> doesn't...
>
>
>> IndexContents TXT* .txt
>> IndexContents HTML* .htm
>> IndexContents HTML* .html
>>
>> FileFilter .pdf pdftotext "'%p' -"
>> IndexContents TXT* .pdf
>>
>> FileFilter .doc catdoc
>> IndexContents TXT* .doc
>>
>> FileFilter .ppt ppthtml
>> IndexContents TXT* .ppt
>>
>>
>> But of course I have something wrong in there since I am getting lots of
>> errors from catdoc, and also I don't know how to put the excel one in
>> there since I think it is a perl script.
>>
>
>
> --
> Peter Karman  .  http://peknet.com/  .  peter(at)not-real.peknet.com
>
Received on Fri May 6 14:13:45 2005