Skip to main content.
home | support | download

Back to List Archive

Re: [swish-e] can't index with spider.pl

From: Louis-David Mitterrand <vindex+lists-swish-e(at)not-real.apartia.org>
Date: Fri Dec 07 2007 - 11:14:09 GMT
On Thu, Dec 06, 2007 at 01:50:58PM -0800, Bill Moseley wrote:
> On Thu, Dec 06, 2007 at 07:46:36PM +0100, Louis-David Mitterrand wrote:
> > 	http://trajan.apartia.fr/index.md:158: error: htmlParseEntityRef: expecting ';'
> 
> > 	Warning: Unknown header line: 'CTYPE html' from program spider.pl
> 
> Those are two different errors.  The first one means that libxml2
> found an entity but it wasn't terminated by a ';'.
> 
> <a href="http://www.dessy.com/?go=dresses&style
> 
> That's not valid.  You need 
> 
> <a href="http://www.dessy.com/?go=dresses&amp;style

This one is now fixed.
> 
> The "Unknown header" is due to the spider reporting the incorrect
> length of a document to swish.

By trial an error I've found that adding:

	AddDefaultCharset ISO-8859-15

to /etc/apache2/conf.d/charset makes the problem go away. Previously no 
default charset was defined.

Thanks Bill for you help,
_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Fri Dec 7 06:14:12 2007