max thom stahl scribbled on 12/22/06 11:55 AM:
> I'm having a problem where HTML entities in the meta descriptions of my
> site aren't getting interned into the index properly. Is there a way
> that I could either have Swish-e encode its index in UTF-8 or have the
> spider.pl that came with Swish-e scrape the entities out for me?
>
If the entities you are describing represent UTF-8 codepoints above 255, then
you may be running into problems with the UTF-8 -> latin1 conversion, not the
entities per se.
Swish-e v2 does not support UTF-8. Plans are in the works for v3 to support
UTF-8 but that is some time away.
Post what problem HTML text you have; a small test doc is most helpful.
See also: http://swish-e.org/docs/swish-config.html#converthtmlentities
pek
--
Peter Karman . http://peknet.com/ . peter(at)not-real.peknet.com
Received on Fri Dec 22 11:44:39 2006