Hi Laim
I am using linux, so am not sure if this is relevant, but my .conf entries
are slightly different eg
there is an apostrophe either side of the %p in each of the following:
FileFilter .pdf /usr/local/bin/pdftotext "'%p' -"
FileFilter .doc /usr/local/bin/catdoc "-s8859-1 -d8859-1 '%p'"
And also these entries:
# Only the following type of files
IndexOnly .htm .html .txt .doc .pdf
# Tell Swish-e that .txt files are to use the text parser.
IndexContents TXT* .txt
# Otherwise, use the HTML parser
DefaultContents HTML*
# Ask libxml2 to report any parsing errors and warnings or
# any UTF-8 to 8859-1 conversion errors
ParserWarnLevel 9
Rgds
Michael
-----Original Message-----
From: users-bounces@lists.swish-e.org
[mailto:users-bounces@lists.swish-e.org] On Behalf Of Liam Buchanan
Sent: Tuesday, 12 February 2008 11:27 AM
To: users@lists.swish-e.org
Subject: [swish-e] Swish-e not indexing doc or PDF files
Hi,
Hope someone can suggest a solution to this frustrating problem.
We are running swish-e on our development server that indexes our
production intranet server. However the problem lies in the inability
for the indexing to process .doc or PDF files. When the search reaches a
hyperlink that is linked to a PDF or doc file the process halts and the
error message is produced below (under output)
Before running swish-e, we connect to our production server via a proxy
connection first (ntlmaps)
The search indexing runs fine for typical text.
Here are the specs of our environment --
Swish-e Version:
Swish-e version 2.4.5
------------------------
OS (development and production):
Windows 2000
------------------------
Proxy port application for remote index on another server:
ntlmaps-0.9.9.0.1
------------------------
.conf file contents: - the hashed out filters are other ones I have
tried. We have also attempted to output the results to a text file - the
output is garble.
#FileFilter .doc /usr/bin/antiword "'%p'"
#FileFilter .PDF /usr/bin/pdftotext "'%p' -"
#FileFilter .PDF c:\SWISH-E\bin\pdftotext '"%p" -htmlmeta
c:\SWISH-E\pdfoutput.txt'
#FileFilter .doc c:\SWISH-E\bin\catdoc.exe '-s8859-1 -d8859-1 "%p"
> temp.txt'
FileFilter .doc c:\SWISH-E\bin\catdoc.exe "-s8859-1 -d8859-1 %p"
FileFilter .PDF c:\SWISH-E\bin\pdftotext.exe "%p"
IndexOnly .txt .ps .PDF .html .htm .doc .rtf .xls .mcd .for .ini
IndexOnly .eps .pcm .c .h .cc .m .sh .ppt
IndexOnly .for .cpp
------------------------
Output:
(650 words)
http://*******/dsdweb/v4/apps/web/content.cfm?id=3150 - Using HTM
L2 parser - http://******/dsdweb/v4/apps/web/content.cfm?id=3150
:784: error: Unexpected end tag : a
ML = "[<a href=\"#\" onclick=\"cntCtrlsState(\'hide\'); return
false;\">Hide</a>
(523 words)
http://******/dsdweb/v4/apps/web/secure/docs/103.doc - Using TXT
2 parser - (no words indexed)
------------------------
------
At this point the whole process freezes.
Any ideas???
Thanks
_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
No virus found in this incoming message.
Checked by AVG Free Edition.
Version: 7.5.516 / Virus Database: 269.20.2/1271 - Release Date: 11/02/2008
8:16 AM
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.5.516 / Virus Database: 269.20.2/1271 - Release Date: 11/02/2008
8:16 AM
_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Tue Feb 12 07:20:51 2008