Skip to main content.
home | support | download

Back to List Archive

Displaying Word Documents(.doc) metanames using Catdoc

From: Jono Tan <jonotan(at)not-real.hotmail.com>
Date: Mon May 23 2005 - 04:34:50 GMT
Hey guys,

Im not sure how to display .doc metadata from the swish-e index.  For 
example if i want to search for a word document with author Joe Doe (This 
should in turn search for metanames: author = Joe Doe).  I can get this 
working with OpenOffice files as I can look @ the "meta.xml" file, but 
cannot do this with the Word documents (filter catdoc).

Currently using the config provided below, and the swish.cgi to display the 
search function.

P.S Anyone point me to some tutorials on configuring templates 
(TemplateToolKit) for the swish.cgi script.  Having trouble configuring the 
output from the template.

Thankyou,

Jonathan Tan

=============
IndexDir /var/www/test
IndexFile /var/www/test/index.swish-e
IndexName Documents
IndexOnly .xml .htm .html .txt .doc .rtf .sxw .sxc .sxi .odt
DefaultContents TXT
SwishProgParameters -S fs

ReplaceRules replace /var/www/test /test
ExtractPath subject regex !^/test/([^/]+)/.*$!$1!

# Allow extra searching by title, path
MetaNames swishtitle swishdocpath
UndefinedMetaTags auto
PropertyNames dc:creator dc:date


IndexContents TXT* .pdf
FileFilter .pdf "/usr/bin/pdftotext" "'%p' -"

IndexContents TXT* .doc
FileFilter .doc "/usr/bin/catdoc" "-s8859-1 -d8859-1 '%p'"

IndexContents TXT* .rtf
FileFilter .doc "/usr/bin/catdoc" "'%p'"

#IndexContents TXT* .xls
#FileFilter .doc "/usr/bin/xls2csv" "'%p'"

FileFilterMatch "/usr/bin/unzip" "-p \"%p\" meta.xml" 
/\.(sxw|sxc|sxi|odt)$/i
IndexContents XML* .sxw .sxc .sxi .odt
StoreDescription XML* <text:p>

FileFilterMatch "/usr/bin/unzip" "-p \"%p\" content.xml" 
/\.(sxw|sxc|sxi|odt)$/i
IndexContents XML* .sxw .sxc .sxi .odt
StoreDescription XML* <text:p>
=========================
Received on Sun May 22 21:35:01 2005