Liam Buchanan wrote on 2/11/08 6:26 PM:
> Hi,
> Hope someone can suggest a solution to this frustrating problem.
> We are running swish-e on our development server that indexes our
> production intranet server. However the problem lies in the inability
> for the indexing to process .doc or PDF files. When the search reaches a
> hyperlink that is linked to a PDF or doc file the process halts and the
> error message is produced below (under output)
> Before running swish-e, we connect to our production server via a proxy
> connection first (ntlmaps)
it isn't clear to me how you are aggregating your documents. spider.pl ? Some
other crawler?
The FileFilter config can work at odds with the SWISH::Filter stuff in
spider.pl, effectively trying to convert non-text files 2x.
Try indexing one, troublesome, document. Break down the process: fetching the
doc, feeding it to swish-e, etc. Turn on verbosity and the -T debugging options.
--
Peter Karman . http://peknet.com/ . peter(at)not-real.peknet.com
_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Mon Feb 11 22:00:54 2008