So I try another way ton index files :
in my swish.conf I use for all files the rule FilterMatch like this :
FileFilter .pdf /usr/bin/pdftotext "'%p' -"
IndexContents TXT .pdf
FileFilter .doc /usr/bin/catdoc "-s8859-1 -d8859-1 '%p'"
IndexContents TXT .doc
FileFilterMatch .ppt "/usr/bin/ppthtml" "'%p'"
IndexContents HTML .html .ppt
StoreDescription HTML* <test:p> 20000
FileFilterMatch "/usr/bin/unzip" "-p \"%p\" content.xml"
/\.(sxw|sxc|sxg|sxi)$/i
IndexContents XML* .xml .sxw .sxc .sxg .sxi
StoreDescription XML* <text:p> 20000
FileFilterMatch .xls /usr/bin/xlhtml "'%p'"
IndexContents HTML .html .xls
StoreDescription HTML* <test:p> 20000
Benoit Guguin a écrit :
>Ok thank you,
>
>I Have tested with Dirtree.pl and it's works fine with xls, pdf and doc.
>
>So I'm currently looking to add filter for powerpoint and openoffice
>(sxi, sxw, sxc). But I don't understand the source code :( ...
>
>If someone already do this, can he give us the file please ?
>
>
>Thanks again,
>
>Regards,
>
>Peter Karman a écrit :
>
>
>
>>The .pm files:
>>
>> doc2txt.pm
>> pdf2html.pm
>> pdf2xml.pm
>>
>>are example modules that predate (iirc) the SWISH::Filters class. The reason
>>pdf2html works in your script is this line in the pdf2html.pm file:
>>
>> @EXPORT = qw(pdf2html);
>>
>>which tells Perl to make that function available in your script's namespace with
>>the 'use' function.
>>
>>I'd suggest using the DirTree.pl example script instead; it calls SWISH::Filter
>>for you correctly.
>>
>>Benoit Guguin scribbled on 8/19/05 4:45 AM:
>>
>>
>>
>>
>>
>>>Hello,
>>>
>>>I try to index a directory with only pdf, doc, xls and ppt.
>>>
>>>
>>>I've seen in version 2.5.4 some perl script to filter .ppt, .xls and .doc.
>>>
>>>I try to use them with the prog method but when I run swish-e (
>>>"swish-e -c /etc/swish-e/swish.conf -S prog") I have thoses erros :
>>>
>>>Undefined subroutine &main::Doc2html called at /etc/swish-e/swish.pl
>>>line 55.
>>>Or
>>>Undefined subroutine &main::pp2hml called at /etc/swish-e/swish.pl
>>>
>>>The error depends of the order of the functions.
>>>
>>>
>>>So I don't undestand why it's work fine for pdf but not for others
>>>format...
>>>
>>>I'm looking around ml archive but dont find my St Graal;)
>>>
>>>Any idea please ?
>>>
>>>Regards,
>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
>>
>>
>
>
>
>
--
Guguin Benoit
Société Alixen 2 rue Jean Rostand 91 893 Orsay Cedex France
Tel : 01 69 85 24 13, Fax : 01 69 85 24 10
Received on Fri Aug 19 05:58:24 2005