Skip to main content.
home | support | download

Back to List Archive

Re: win2k unknown header problem

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Thu Sep 26 2002 - 19:36:45 GMT
At 11:44 AM 09/26/02 -0700, David L Norris wrote:
>Bill, spider.pl shouldn't be returning any contents along with a
>nocontents, right?  I think this is the problem.

That's the problem from a logical way of thinking!

It's that old mix of parsers problem again.  If using the HTML parser swish
expects the file to be in memory.  So with -S prog I just emulated what the
http and fs input methods do and return the document.  

The problem was that Matt wasn't specifying a parser, so swish wasn't
passing the document off to the libxml2 parser which would have flushed the
input stream.  (Remember I changed the default parser to be libxml2 if
linked in.)

If sunsite's mail wasn't so slow you would have seen my response -- I agree
that it would be better to have spider.pl just return only <title> as the
contents -- then it would mimic the NoContents.  As I said, no point in
returning content that's just going to get flushed anyway.


-- 
Bill Moseley
mailto:moseley@hank.org
Received on Thu Sep 26 19:41:40 2002