At 11:44 AM 09/26/02 -0700, David L Norris wrote:
>Bill, spider.pl shouldn't be returning any contents along with a
>nocontents, right? I think this is the problem.
That's the problem from a logical way of thinking!
It's that old mix of parsers problem again. If using the HTML parser swish
expects the file to be in memory. So with -S prog I just emulated what the
http and fs input methods do and return the document.
The problem was that Matt wasn't specifying a parser, so swish wasn't
passing the document off to the libxml2 parser which would have flushed the
input stream. (Remember I changed the default parser to be libxml2 if
linked in.)
If sunsite's mail wasn't so slow you would have seen my response -- I agree
that it would be better to have spider.pl just return only <title> as the
contents -- then it would mimic the NoContents. As I said, no point in
returning content that's just going to get flushed anyway.
--
Bill Moseley
mailto:moseley@hank.org
Received on Thu Sep 26 19:41:40 2002