>
>
> On Wed, Aug 04, 2004 at 04:50:46AM -0700, Mammitzsch.T@zdf.de wrote:
> > Hi everybody,
> >
> > i try to spider an IIS 6.0 which delivers pages with utf-8 in the
> > http-header. As far as i understood the manual, swish-e
> converts utf-8 to
> > iso-8859-1 if i use libxml2 (html2-parser). Unfortunately
> special chars like
> > german umlauts are not recognized if i search through the swish.cgi
> > frontend. Also results with umlauts are not displayed
> correctly. swish-e
> > runs on a sun e450 with solaris 5.8. Any ideas?
>
> Basically what Peter said. One thing you should try is while indexing
> and spidering (a few small test files) use the options
>
> -T parsed_words indexed_words
>
> which will show you what white-space separated words are being fed to
> swish and how they are converted into words stored in the index (via
> WordCharacters setting).
>
ok, indexer did e.g.
White-space found word 'Saarbrucken'
Adding:[648:swishdefault(1)] 'saarbrucken' Pos:397 Stuct:0x9 ( BODY
FILE )
looks good for me, but searching for saarbrucken returns lots of results
where "saarbrucken" is not included.
other words with umlauts return no results (except 1 pdf which i found).
why isn't it working when searching?
bye, Thomas Mammitzsch
Received on Wed Aug 4 09:01:39 2004