On Fri, 24 Jan 2003, Smith, Doug wrote:
> I've spend several frustrating hours debugging an index job that uses
> spider.pl, and having found the solution, I thought I'd share it to
> save others the trouble. I have a site of about 1,000 links, mostly
> HTML and PDF files. I used the built-in spider.conf and the filter as
> recommended in the docs. (swish-e 2.2.3, RedHat 8.0 - 2.4.18.) It
> worked wonderfully on the development server, then failed on the new
> production server (of course). The spider process failed on several
> of the PDF files, with a message "err: External program failed to
> return required headers Path-Name: & Content-Length:".
That noramlly means that the content-length of the previous "file" sent to
swish was not correct.
> I took one of the offending PDFs and ran it through pdf2html.pm.
> That failed too, on a "tr / ..." line 201. After much hunting I
> discovered that the LANG environment variable on the production server
> was "en_US.UTF-8", while the dev server was simply "en_US". When I
> removed the "UTF-8" from the production box, it worked great! So, it
> appears that pdf2html.pm wants to do its transliteration in Unicode
> rather than UTF-8, at least, that's my uneducated guess.
So what was happening? When you say "fail" did Perl give an error
message?
--
Bill Moseley moseley@hank.org
Received on Sun Jan 26 18:59:11 2003