On Wed, Oct 27, 2010 at 10:31 AM, Troy Wical <troy@wical.com> wrote:
> > Warning: Unknown header line: 'ath-Name:
> http://type2.com/ezmlm-archives/index.cgi?list=type2&cmd=monthbydate&month=201009'
> from program spider.pl
> > err: External program failed to return required headers Path-Name:
> Regarding the "Unknown header line" error, I'm having a heck of a time
> finding anything related to that. Full debugging has been activated, and
> I've gone through and tried to look at the previous URL's that may be
> throwing it off, but no luck yet. Maybe a break in the garage, playing
> mechanic, will help me come back refreshed.
>
I have not looked at that code in, well, years. Swish *should be working
with bytes, so my guess is that the spider is telling swish that the content
is one byte longer than it really is.
http://dev.swish-e.org/browser/swish-e/trunk/prog-bin/spider.pl.in#L1409
# Re-encode the data for outside of Perl
1407 eval {
1408 # Need to only require Encode here?
1409 $$content = Encode::encode( $server->{charset}, $$content )
1410 if $server->{charset};
1411 };
1412 if ( $@ ) {
1413 print STDERR "Warning: document '", $response->request->uri, "'
could not be encoded to charset '$server->{charset}'\n";
1414 delete $server->{charset};
1415 }
$content should now be a reference to a string of bytes.
1416
1417 $server->{counts}{'Total Bytes'} += length $$content;
1418 $server->{counts}{'Total Docs'}++;
1419
1420
1421 # ugly and maybe expensive, but perhaps more portable than "use
bytes"
1422 my $bytecount = length pack 'C0a*', $$content;
1423
This is a wild guess, but what if you replace that with:
my $bytecount = length $$content;
It's probably the same but that's how I would get the length from a string
of bytes.
The other thing, if you really want to battle this, is to output the spider
to a file and then use an editor and try and figure out the length
difference -- or maybe just add an extra space character before the
Path-Name line where it's failing and then feed that to swish.
--
Bill Moseley
moseley@hank.org
_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Wed Oct 27 16:18:24 2010