Hi,
Exactly the same happens *with* your patch...
> -----Original Message-----
> From: Bill Moseley [mailto:moseley@hank.org]
> Sent: segunda-feira, 31 de Março de 2003 20:24
> To: Nuno Ferreira
> Cc: 'Multiple recipients of list'
> Subject: RE: [SWISH-E] External program failed to return
> required headers Path-Name: & Content-Length:
>
>
> qOn Mon, 31 Mar 2003, Nuno Ferreira wrote:
>
> > I am not the sysadmin of the remote sites. I'll try to
> speak to them.
> > I can test any patch that you want me to try.
>
> You can just make a local copy of spider.pl, so you shouldn't need the
> help of the sysadmin.
>
> Then, as long as you are running something like Perl 5.6.1 or
> newer look
> in spider.pl for:
>
> my $headers = join "\n",
> 'Path-Name: ' . $uri,
> 'Content-Length: ' . length $$content,
> '';
>
> and replace it with something like:
>
> my $doc_length = do { use bytes; length $$content };
>
> my $headers = join "\n",
> 'Path-Name: ' . $uri,
> 'Content-Length: $doc_length',
> '';
>
> I suppose you might even be able to just place:
>
> use bytes;
>
> toward the top of spider.pl and it would work, too. But
> there might be
> some other side-effects so the above might be a safer fix for now.
>
>
>
>
>
>
>
>
> >
> > Regards,
> > Nuno
> >
> > > -----Original Message-----
> > > From: Bill Moseley [mailto:moseley@hank.org]
> > > Sent: segunda-feira, 31 de Março de 2003 15:35
> > > To: Nuno Ferreira
> > > Cc: Multiple recipients of list
> > > Subject: Re: [SWISH-E] External program failed to return
> > > required headers Path-Name: & Content-Length:
> > >
> > >
> > > On Mon, 31 Mar 2003, Nuno Ferreira wrote:
> > >
> > > > It starts and it looks like it is doing everything I
> want, then it
> > > > suddenly crashes with:
> > > > <SNIP>
> > > > Looking at extracted tag '<td
> background="/images/verao_foo_d.jpg">'
> > > > ! Found 0 links in
> > > >
> > > http://www.somesite.com/catalog/formas.php?PHPSESSID=85c724f87
> > > fc7f0e6842
> > > > 5e6454bb4e11d
> > > >
> > > http://www.somesite.com/catalog/detras_loja.php?PHPSESSID=85c7
> > > 24f87fc7f0
> > > > e68425e6454bb4e11d - Using DEFAULT (HTML2) parser - (565 words)
> > > > err: External program failed to return required headers
> Path-Name: &
> > > > Content-Length:
> > > > .
> > > > </SNIP>
> > > >
> > > > It always crashes in the same place. If I spider a
> > > different site, it
> > > > crashes also and always in the same place.
> > > > I've found this thread
> > > <http://swish-e.org/archive/3817.html> that is
> > > > related to my problem but after reading it, I became even
> > > more confused
> > > > because now I know that I may be looking at the wrong debug
> > > line because
> > > > of the beffering issues.
> > >
> > > First, see if this if a possible fix:
> > >
> > http://swish-e.org/archive/4870.html
> >
> >
> > If you set debug => DEBUG_URL then it will display the URLs
> as they are
> > fetched and before swish gets the document. That should
> help find the
> > exact document where the problem is happening.
> >
> > But that error "failed to return required headers" is
> likely due to the
> > *previous* document returning the wrong content length.
> The way extprog
> > works is it reads line-by-line to read the headers. Then
> when it sees a
> > blank line (that marks the end of the headers) it reads
> content-length
> > bytes in from the external program and starts over.
> >
> > If that content length was short one byte, and last byte of
> the doc is a
> > \n then when it starts to read the next doc it will see just \n and
> > assume
> > that's the end of the headers. But at that point no
> Content-Length or
> > Path-Name header is set so the program aborts with that error.
> >
> > I suspect what is happening is that previous document has a
> wide char
> > and
> > forcing perl into UTF-8 encoding. spider.pl is using "length" to
> > determine the length of the string, but that's the
> character lenght not
> > the byte length:
> >
> > $ perl -MDevel::Peek -e '$x=chr(400);Dump($x);print "len=",
> length$x,
> > "\n"'
> > SV = PV(0x80f6344) at 0x80fd2a4
> > REFCNT = 1
> > FLAGS = (POK,pPOK,UTF8)
> > PV = 0x80f9e58 "\306\220"\0
> > CUR = 2
> > LEN = 3
> > len=1
> >
> > So the length of the string is two bytes, but "length" is
> returning one.
> > That would result in your problem.
> >
> > I need to find a portable way for use with all versions of
> Perl to read
> > the correct byte length.
> >
> >
> >
>
> --
> Bill Moseley moseley@hank.org
>
>
>
Received on Tue Apr 1 09:56:34 2003