Skip to main content.
home | support | download

Back to List Archive

Re: F_OPEN_TEXT in index.c

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Thu Dec 19 2002 - 23:54:37 GMT
At 03:06 PM 12/19/02 -0800, Kyle Judson wrote:
>I was having problems with SWISH 2.2.1, the old xml parser and some =
>files on NT 2000.  The parser was complaining about them being "not well =
>formed" and stuff.

I'll try not to complain about how Windows issues always comes up!
Actually, it's not Windows fault, but the old way that swish-e reads the
entire file into a buffer.

>I tracked the problem to the fact that the file is opened in index.c as =
>F_READ_TEXT and then when it is read by read_stream in file.c the buffer =
>is terminated at filelen instead of bytes read and the filelen was not =
>updated.  In windows the test read strips some characters, in my case =
>the 0x0D of the CR/LF pair at the end of each line.   The result was =
>that the buffer was not completely filled and there was garbage at the =
>end that the parser was choking on.
>
>I changed the F_READ_TEXT to F_READ_BINARY and that seems to have solved =
>my problem but I am worried about an unforeseen side effect.  Am I OK?

I think that's the wrong way to fix it.  What needs to be done is to reset
the buffer length after reading from the file.  

Try resetting the buffer length in index.c right after the read_stream() call:

    fprop->fsize = strlen( rd_buffer );

I didn't test, and I think that there's more involved for a proper fix.

The HTML2, XML2, and TXT2 parsers read from the file handle until eof, so
the size of the file is not important.  The exception is with -S prog where
you have to tell it how many bytes to read (and in Windows the external
program writes in text mode so swish's -S mode has to read in text mode).



-- 
Bill Moseley
mailto:moseley@hank.org
Received on Thu Dec 19 23:54:47 2002