Skip to main content.
home | support | download

Back to List Archive

[swish-e] Unknown header line

From: at <Clint>
Date: Fri, 07 Oct 2011 15:30:38 +0200
Hi,

Swish-e no longer wants to update the index after having run spider.pl.
It ran perfectly for more than two years now, but has started to abort
and spew out an error I initially had when I started using it.

On Linux .

I first run on command line:
/usr/local/lib/swish-e/spider.pl > output.txt

and then
swish-e -c swish.conf -S prog -i stdin < output.txt

but, this aborts after awhile with

Warning: Unknown header line: 'om/linking/' from program stdin
err: External program failed to return required headers Path-Name:

I have tried all these 3 options individually in spider.pl
    my $bytecount = length pack 'C0a*', $$content;

    my $bytecount = length($$content);

    use bytes;
     $bytecount = length $$content;

and get the same result.

If I look at the output.txt file, I can see that some of the entries
don't have "Path-Name" on a line on its own, but instead is sitting next
to the closing </html> tag of the previous entry.

eg.

<!-- InstanceEnd -->
</html>Path-Name: http://www.site.com/index.htm

and not like

<!-- InstanceEnd -->
</html>
Path-Name: http://www.site.com/index.htm

Is it doing this because some of the pages don't end off with a new
line, or has this got to do with page encoding or this multi-byte issue,
I've seen mentioned.

As nothing has been changed on the server, it must be an issue with some
of the web pages?

Am stuck - please help. Thanks








_______________________________________________
Users mailing list
Users(at)not-real.lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Fri Oct 07 2011 - 13:29:37 GMT