Swish-e and the accented characters

From: Nicolas Huillard <nhuillard(at)>
Date: Fri Sep 04 1998 - 09:16:35 GMT
I am using Swish-e since yesterday on a french web Site.

First of all, I would like to thank Roy, Giulia, Kirk, Mary and Kevin for this wonderfull work (I tried to use Excite for Web Server, but it was awfully complex and bugged and coded "comme un cochon". The only good thing in this product is the Web administration interface : Kirk should have a look at It took me only two days from the downloading of the software and it's complete use in my web site, which will index 30000 documents (only 2000 for today) : compilation of the sources, understanding of the functionnality, coding of a taylored CGI script which integrates Swish-e in the core of my site, fully fomated output and so on.

The objet of this message is to ask questions about accented characters :
* swish-e indexes lowercased word and converts the accented characters to pure characters. I think it is a good think,
* it doesn't convert accented characters in search word : the same word (ie : embêté) is indexed as "embete" and searched as "embêté" : is is not found in the index.
I think it could be a good idea to process both indexed and searched word the same way, in order for that words to be found. Maybe it is already implemented in release 1.2 (which I didn't found on SunSITE).
I would also like to learn a little more about processing of HTML entities : some are converted to unaccented chararcters in the index, and some aren't (those without an ASCII equivalent, like &#156; &laquo; &degree;, and those in the word "h&eacute;patospl&eacute;nom&eacute;galie.", which is 41 characters long with the final dot).


