Skip to main content.
home | support | download

Back to List Archive

Re: [swish-e] err: External program failed to return required headers Path-Name (Swish-e 2.4.5)

From: Clint <clintw(at)not-real.uyaphi.com>
Date: Thu Mar 29 2007 - 13:51:24 GMT
Hi Rene

Thank you for the reply.

Just to let you know, the following code:

my $bytecount = length($$content);

did the trick. Swish-e now includes the database content as well.
I'm just glad its working now. Thanks so much. ;-))

Regards
Clint


Rene.Kloos@esa.int wrote:
> Hello Clint,
>
> Although Bill Moseley can give you the most accurate answer, I can already
> say that I stumbled into the same problem at one point. In your setup the
> spider creates an output file which is used as an input file for Swish-e to
> index. This file requires several headers to be present for every spidered
> page, e.g. path name and content-length. The content-length value is taken by
> Swish-e to read in the next <content-length> characters. The fact that your
> warning contains 'h-Name' shows that Swish-e reads in 3 characters too many,
> i.e. 'Pat', so Swish-e doesn't find the next 'Path-Name' header where it
> expects to find it. This means that the value listed in the content-length
> header is not in accordance with the actual content-length.
>
> I guess this has to do with the UTF-8/Latin-1 issue when using libxml2, but I
> am certainly no expert in that area :-)
>
> In one of the posts it is suggested to modify the spider.pl:
>
> my $bytecount = length pack ‘c0a*’, $$content;
>
> should become:
>
> my $bytecount =  do { use bytes; length( $$content) };
>
> This did actually NOT do the trick for me. The following DID:
>
> my $bytecount = length($$content);
>
> I have been happy indexing ever since (static pages, not dynamic ones).
>
> Hope this helps,
> René
>
> users-bounces@lists.swish-e.org wrote on 29/03/2007 12:09:32:
>
>   
>> SWISH-E 2.4.5
>>
>> Linux 2.6.9-42.0.8.ELsmp #1 SMP Tue Jan 23 13:01:26 EST 2007 i686 i686
>> i386 GNU/Linux
>>
>>
>> I initially indexed only static pages, which worked fine. However it has
>> become necessary to index the database driven pages as well.
>>
>> I setup spider.pl and got as far as having it generate the output.
>> txt file which is
>> around 40MB+, using  /usr/local/lib/swish-e/spider.pl default http:
>> //my_server.com/index.html > output.txt
>> No errors were reported.
>>
>> But now when I run
>> swish-e -c config -S prog -i stdin < output.txt
>>
>> I get this fatal error soon after
>>
>> Warning: Unknown header line: 'h-Name: http://www.xxx.xxx/xx.htm'
>> from program spider.pl
>> err: External program failed to return required headers Path-Name:.
>>
>> I have looked up this error, but the posts are from 2003-2005 and
>> although explain
>> possible reasons why this is happening, don't really show how to
>> fix, or workaround this error.
>>
>> I'm only indexing html text files and text from dynamic pages, not
>> images, pdfs or anything like that.
>>
>> How does one fix this?
>>
>> Regards
>> Clint
>>
>>
>>
>>
>>
>> _______________________________________________
>> Users mailing list
>> Users@lists.swish-e.org
>> http://lists.swish-e.org/listinfo/users
>>     
> _______________________________________________
> Users mailing list
> Users@lists.swish-e.org
> http://lists.swish-e.org/listinfo/users
>   



_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Thu Mar 29 09:51:04 2007