Skip to main content.
home | support | download

Back to List Archive

Re: Once again filename encoding problems - Macos-X Tiger

From: Thomas Nyman <thomas(at)>
Date: Thu Nov 17 2005 - 09:31:48 GMT
Back again, and making some headway :)

I've discovered that MacOS-X filesystem stores files as UTF-8  
(Normalized according to some),

Anyway, if i change the following setting in TempleteDefault  -  my  
$output =  $q->header . page_header( $results ); - to my $output =   
$q->header(-charset=>'UTF-8') . page_header( $results );  then  
filenames are displayed correctly with regards to umlauts .. however  
the content of swishdescription displays incorrectly then.

Since the bulk of the documents are word documents being parsed  
through catdoc  i changed my swish.conf as follows
FileFilter .doc /usr/local/bin/catdoc "-b -s8859-1 -dutf-8 '%p' "

The results now show correct filenames with umlauts however there are  
still some parts displaying incorrectly. The descriptions of the file  
contents and highlighting is pretty much correct with one or two  
faulty representations but now parts of the form are displaying  
incorrectly.  I'm enclosing a screendump of what it looks like.
Oh, and the browsers default encoding is utf-8

The issue seems to point towards some part of the html page being  
produced is setting an encoding other than utf-8..question is where  
this is being set?

Thanks for any help at all


14 nov 2005 kl. 17.27 skrev Bill Moseley:

> On Mon, Nov 14, 2005 at 03:10:25PM +0100, Thomas Nyman wrote:
>> is there a way i can check if this is a perl issue or not, for
>> instance perhaps by writing a perl script that lists the content of a
>> catalogue? If that lists the filenames correctly then perhaps its not
>> a perl issue?
> How about looking at the output of swish-e directly compared with the
> output from wget fetching the same results?  Use something like od to
> dump the characters.
> -- 
> Bill Moseley
Received on Thu Nov 17 01:31:50 2005