Skip to main content.
home | support | download

Back to List Archive

Re: Indexing UTF-8 IIS Pages

From: Tim Freedom <tim_freedom(at)not-real.yahoo.com>
Date: Thu Aug 05 2004 - 04:28:00 GMT
--- Mammitzsch.T@zdf.de wrote:
> i try to spider an IIS 6.0 which delivers pages with utf-8 in the
> http-header. As far as i understood the manual, swish-e converts utf-8 to
> iso-8859-1 if i use libxml2 (html2-parser). Unfortunately special chars like
> german umlauts are not recognized if i search through the swish.cgi
> frontend. Also results with umlauts are not displayed correctly. swish-e
> runs on a sun e450 with solaris 5.8. Any ideas?

Swish-e is not UTF-8 friendly and won't index properly.  I have heaps
of Arabic, Farsi and Urdu docs which I can't search due to this limitation
and continue to hope for someone to remedy this issue and bring forth
proper UTF-8 support.  As has been noted on this list in the past, that
doesn't sound like it will sadly happen anytime soon either (although the
demand is there for it).

 .tf.


		
__________________________________
Do you Yahoo!?
Yahoo! Mail Address AutoComplete - You start. We finish.
http://promotions.yahoo.com/new_mail 
Received on Wed Aug 4 21:28:16 2004