On Wed, Aug 04, 2004 at 04:50:46AM -0700, Mammitzsch.T@zdf.de wrote:
> Hi everybody,
>
> i try to spider an IIS 6.0 which delivers pages with utf-8 in the
> http-header. As far as i understood the manual, swish-e converts utf-8 to
> iso-8859-1 if i use libxml2 (html2-parser). Unfortunately special chars like
> german umlauts are not recognized if i search through the swish.cgi
> frontend. Also results with umlauts are not displayed correctly. swish-e
> runs on a sun e450 with solaris 5.8. Any ideas?
Basically what Peter said. One thing you should try is while indexing
and spidering (a few small test files) use the options
-T parsed_words indexed_words
which will show you what white-space separated words are being fed to
swish and how they are converted into words stored in the index (via
WordCharacters setting).
--
Bill Moseley
moseley@hank.org
Unsubscribe from or help with the swish-e list:
http://swish-e.org/Discussion/
Help with Swish-e:
http://swish-e.org/current/docs
swish-e@sunsite.berkeley.edu
Received on Wed Aug 4 07:35:06 2004