Skip to main content.
home | support | download

Back to List Archive

Re: Parsing doc, xls and excel files with swish-e and libxml2

From: David L Norris <dave(at)not-real.webaugur.com>
Date: Tue Jun 28 2005 - 06:21:54 GMT
On Tue, 2005-06-28 at 11:18 +0530, Animesh Bansriyar wrote:
> I cannot ask all users to have perl on their systems as well.

I'm not sure why you think you need Perl.  Perl is not required for
anything.

> What are the chances of adding in a native parser for all document formats 
> onto swish-e itself? I would love to contribute if this is possible and feasible.

"Native" filters are installed by the Swish-e Windows installer for Word
(catdoc) and PDF (pdftotext) documents.  You can use catdoc, wvware,
xpdf, or any other program that converts a document to Text, HTML, or
XML with a FileFilter directive during indexing:
  http://swish-e.org/docs/swish-config.html#item_filefilter


For Word documents your config file might look like this:
  FileFilter .doc catdoc.exe '-s8859-1 -d8859-1 "%p"'


Catdoc and pdftotext are included with the Swish-e for Windows builds.
You can place additional filter programs into the lib\swish-e directory
of your installation.  So if you install to c:\swish-e then you would
place additional filters in the c:\swish-e\lib\swish-e\ directory.

-- 
 David Norris
  http://www.webaugur.com/dave/
  ICQ - 412039
Received on Mon Jun 27 23:21:55 2005