On Wed, Sep 12, 2007 at 03:40:04PM -0500, Peter Karman wrote:
> So likely there's an issue with spider.pl and how it is calculating length()
> for docs with unreliable encodings. That's my guess anyway. spider.pl could
> probably be made smarter about sanity checking the docs for length and
> encoding, and made to fail gracefully somehow. I know there's been talk here
> lately about some of the encoding stuff it does.
The spider just needs to *always* decode on input, then encode back to
the original charset, and then use length() to report the length.
That seems like the most simple and correct way to go. Seems right to
Unsubscribe from or help with the swish-e list:
Help with Swish-e:
Users mailing list
Received on Wed Sep 12 16:44:11 2007