Hi,
Hope someone can suggest a solution to this frustrating problem.
We are running swish-e on our development server that indexes our
production intranet server. However the problem lies in the inability
for the indexing to process .doc or PDF files. When the search reaches a
hyperlink that is linked to a PDF or doc file the process halts and the
error message is produced below (under output)
Before running swish-e, we connect to our production server via a proxy
connection first (ntlmaps)
The search indexing runs fine for typical text.
Here are the specs of our environment --
Swish-e Version:
Swish-e version 2.4.5
------------------------
OS (development and production):
Windows 2000
------------------------
Proxy port application for remote index on another server:
ntlmaps-0.9.9.0.1
------------------------
.conf file contents: - the hashed out filters are other ones I have
tried. We have also attempted to output the results to a text file - the
output is garble.
#FileFilter .doc /usr/bin/antiword "'%p'"
#FileFilter .PDF /usr/bin/pdftotext "'%p' -"
#FileFilter .PDF c:\SWISH-E\bin\pdftotext '"%p" -htmlmeta
c:\SWISH-E\pdfoutput.txt'
#FileFilter .doc c:\SWISH-E\bin\catdoc.exe '-s8859-1 -d8859-1 "%p"
> temp.txt'
FileFilter .doc c:\SWISH-E\bin\catdoc.exe "-s8859-1 -d8859-1 %p"
FileFilter .PDF c:\SWISH-E\bin\pdftotext.exe "%p"
IndexOnly .txt .ps .PDF .html .htm .doc .rtf .xls .mcd .for .ini
IndexOnly .eps .pcm .c .h .cc .m .sh .ppt
IndexOnly .for .cpp
------------------------
Output:
(650 words)
http://*******/dsdweb/v4/apps/web/content.cfm?id=3150 - Using HTM
L2 parser - http://******/dsdweb/v4/apps/web/content.cfm?id=3150
:784: error: Unexpected end tag : a
ML = "[<a href=\"#\" onclick=\"cntCtrlsState(\'hide\'); return
false;\">Hide</a>
(523 words)
http://******/dsdweb/v4/apps/web/secure/docs/103.doc - Using TXT
2 parser - (no words indexed)
------------------------
------
At this point the whole process freezes.
Any ideas???
Thanks
_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Mon Feb 11 19:27:00 2008