Skip to main content.
home | support | download

Back to List Archive

Re: Indexing UTF-8 IIS Pages

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Thu Aug 05 2004 - 13:55:33 GMT
On Wed, Aug 04, 2004 at 04:50:46AM -0700, Mammitzsch.T@zdf.de wrote:
> Hi everybody,
> 
> i try to spider an IIS 6.0 which delivers pages with utf-8 in the
> http-header. As far as i understood the manual, swish-e converts utf-8 to
> iso-8859-1 if i use libxml2 (html2-parser). Unfortunately special chars like
> german umlauts are not recognized if i search through the swish.cgi
> frontend. Also results with umlauts are not displayed correctly. swish-e
> runs on a sun e450 with solaris 5.8. Any ideas?

Thomas,

It was never clear in your messages if you are able to search swish-e
directly from the command line and find the words with umlauts.

If that works, but swish.cgi doesn't work then it may be an issue with
either your version of Perl, or maybe the encoding used when sending
the query to swish.

You might turn on debugging in swish.cgi to log the query sent to
swish and make sure that it looks like 8859-1 encoding.

I think this is what you need:

    debug_options => 'command',

-- 
Bill Moseley
moseley@hank.org

Unsubscribe from or help with the swish-e list: 
   http://swish-e.org/Discussion/

Help with Swish-e:
   http://swish-e.org/current/docs
   swish-e@sunsite.berkeley.edu
Received on Thu Aug 5 06:55:50 2004