Hi Bill,
Ok, I understand that I need to include a filter file in order to index the
contents of MS Word documents stored on a Unix system... (As I understand
it, this was NOT necessary under SWISH 1.3...)
I've downloaded and compiled "catdoc"... Catdoc is even referenced in one of
the filter files under SWISH-E 2.1...
.../filter-bin/_doc2text.sh
I've installed it in it's default location, and made sure that the filter
file is pointing to the correct directory structure...
But which configuration file should I modify to make SWISH-E sees this MS
Word filter file?
Thanks in advance,
Joe Fisher
-----Original Message-----
From: Bill Moseley [mailto:moseley@hank.org]
Sent: Friday, August 17, 2001 12:04
To: Multiple recipients of list
Subject: [SWISH-E] Re: Indexing of word documents, stored on a UNIX
At 11:31 AM 08/17/01 -0700, FISHER,JOSEPH (Non-HP-Roseville,ex1) wrote:
>When I index the documents, everything appears to go through just fine,
with
>the following exceptions:
>
> 1) I get a warning message for each file being indexed:
>
> Warning: Possible embedded null in file
>'/case_cr_rpts/docs/dataload/xml_spec3.doc'
Well, without seeing your config, I don't know. To index Word documents you
need to use a filter (or add filtering to your program if indexing with -S
prog).
http://sunsite.berkeley.edu/SWISH-E/2.2/docs/SWISH-CONFIG.html#Document_Filt
er_Directives
Don't use a shell or perl script to call catdoc -- rather call catdoc
directly as shown in the example. The scripts will kill your indexing
speed.
Bill Moseley
mailto:moseley@hank.org
Received on Fri Aug 17 22:22:55 2001