The funny thing is that *no* Filefilter options are specified in my
swish1.conf:
IndexOnly .htm .html .txt .doc .pdf .xls
IndexContents TXT* .txt
DefaultContents HTML*
I can see both /opt/bin/catdoc and /opt/bin/pdttotext , with /opt/bin
being in $PATH so I presume there must be some hard coding within swish-e
that picks them up without the configuration of eg FileFilter
Should these directives be added?:
FileFilter .pdf pdf2html
FileFilter .pdf pdftotext "'%p' -"
FileFilter .doc /opt/bin/catdoc "-s8859-1 -d8859-1 %p"
If not, can the parsing errors be ignored?
Thanks
Michael
Dr Michael Daly wrote on 3/14/12 6:26 AM:
> Here is the contents of the config file:
> IndexDir /share/MD0_DATA/server_dir/Correspondence/2011_Correspondence
> IndexOnly .htm .html .txt .doc .pdf .xls
> IndexContents TXT* .txt
> DefaultContents HTML*
> ParserWarnLevel 9
> #(as I said ParserWarnLevel 1 abolishes the warnings)
> IndexFile /share/MD0_DATA/swish-e-files/swish-e-index/swish_1.index
>
> The command invocations:
> 1. To index:
> swish-e -c /share/MD0_DATA/swish-e-config/swish_1.conf
>
> 2. To search the .index file:
> swish-e -f /share/MD0_DATA/swish-e-index/swish_1.index -w employee
>
make sure you've read this:
http://swish-e.org/docs/swish-config.html#document_filter_directives
and then post back with any questions.
--
Peter Karman . http://peknet.com/ . peter(at)not-real.peknet.com
_______________________________________________
Users mailing list
Users(at)not-real.lists.swish-e.org
http://lists.swish-e.org/listinfo/users
_______________________________________________
Users mailing list
Users(at)not-real.lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Thu Mar 15 2012 - 02:24:12 GMT