Skip to main content.
home | support | download

Back to List Archive

Re: Indexing UTF-8 IIS Pages

From: Tim Freedom <tim_freedom(at)>
Date: Thu Aug 05 2004 - 04:28:00 GMT
--- wrote:
> i try to spider an IIS 6.0 which delivers pages with utf-8 in the
> http-header. As far as i understood the manual, swish-e converts utf-8 to
> iso-8859-1 if i use libxml2 (html2-parser). Unfortunately special chars like
> german umlauts are not recognized if i search through the swish.cgi
> frontend. Also results with umlauts are not displayed correctly. swish-e
> runs on a sun e450 with solaris 5.8. Any ideas?

Swish-e is not UTF-8 friendly and won't index properly.  I have heaps
of Arabic, Farsi and Urdu docs which I can't search due to this limitation
and continue to hope for someone to remedy this issue and bring forth
proper UTF-8 support.  As has been noted on this list in the past, that
doesn't sound like it will sadly happen anytime soon either (although the
demand is there for it).


Do you Yahoo!?
Yahoo! Mail Address AutoComplete - You start. We finish. 
Received on Wed Aug 4 21:28:16 2004