> >
> > On Wed, Aug 04, 2004 at 04:50:46AM -0700, Mammitzsch.T@zdf.de wrote:
> > > Hi everybody,
> > >
> > > i try to spider an IIS 6.0 which delivers pages with utf-8 in the
> > > http-header. As far as i understood the manual, swish-e
> > converts utf-8 to
> > > iso-8859-1 if i use libxml2 (html2-parser). Unfortunately
> > special chars like
> > > german umlauts are not recognized if i search through the
> swish.cgi
> > > frontend. Also results with umlauts are not displayed
> > correctly. swish-e
> > > runs on a sun e450 with solaris 5.8. Any ideas?
> >
> > Basically what Peter said. One thing you should try is
> while indexing
> > and spidering (a few small test files) use the options
> >
> > -T parsed_words indexed_words
> >
> > which will show you what white-space separated words are
> being fed to
> > swish and how they are converted into words stored in the index (via
> > WordCharacters setting).
> >
> ok, indexer did e.g.
>
> White-space found word 'Saarbrucken'
> Adding:[648:swishdefault(1)] 'saarbrucken' Pos:397
> Stuct:0x9 ( BODY
> FILE )
>
> looks good for me, but searching for saarbrucken returns
> lots of results
> where "saarbrucken" is not included.
> other words with umlauts return no results (except 1 pdf
> which i found).
>
> why isn't it working when searching?
>
> bye, Thomas Mammitzsch
hmm, the umlauts are stripped out of my post. i originally wrote saarbrucken
with an "u" with two dots above (german umlaut).
bye, Thomas Mammitzsch
Received on Wed Aug 4 09:28:04 2004