On Sat, Oct 25, 2003 at 05:51:51PM -0700, J Robinson wrote:
> Hello Everyone;
>
> Sometimes when indexing HTML using the HTML2 backend,
> I get messages like these from SWISH-E:
>
> input conversion failed due to input error
> Bytes: 0x25 0x00 0x61 0x3E
That's a message generated by libxml2, not by swish-e. Code in swish-e
causes it to print, so there should be a way to print the file.
> I know that it's multi-byte files that are causing
> the errors. Does anyone know if there's an easy
> workaround to avoid getting these, for example, to
> detect that a file is multi-byte in your -S prog and
> not index it?
I wonder if it's more a problem of libxml2 not figuring out the encoding
correctly -- or perhaps truly an invalid sequence of bytes for the given
encoding. How to deal with it probably depends on what the problem is.
--
Bill Moseley
moseley@hank.org
Received on Sun Oct 26 01:24:38 2003