Skip to main content.
home | support | download

Back to List Archive

Re: HTML entities?

From: Peter Karman <peter(at)not-real.peknet.com>
Date: Fri Dec 22 2006 - 19:44:34 GMT
max thom stahl scribbled on 12/22/06 11:55 AM:
> I'm having a problem where HTML entities in the meta descriptions of my 
> site aren't getting interned into the index properly. Is there a way 
> that I could either have Swish-e encode its index in UTF-8 or have the 
> spider.pl that came with Swish-e scrape the entities out for me?
> 

If the entities you are describing represent UTF-8 codepoints above 255, then 
you may be running into problems with the UTF-8 -> latin1 conversion, not the 
entities per se.

Swish-e v2 does not support UTF-8. Plans are in the works for v3 to support 
UTF-8 but that is some time away.

Post what problem HTML text you have; a small test doc is most helpful.

See also: http://swish-e.org/docs/swish-config.html#converthtmlentities

pek
-- 
Peter Karman  .  http://peknet.com/  .  peter(at)not-real.peknet.com
Received on Fri Dec 22 11:44:39 2006