Rainer Hofmann scribbled on 4/4/07 6:47 AM:
> Hi,
>
> following situation gives me an headache:
>
> Windows clients (Lang=de, CP1252) put PDF-Files onto a fileserver (Linux
> Lang=en_us.utf) via Samba.
> Those files are periodically indexed by swish-e located on the server.
> Works pretty well so far.
> But if clients use non ASCII-characters like öäüßÄÖÜ in their file or
> directory names they run into trouble, when searching these files.
>
sounds like a messy encoding problem. I assume Windows doesn't use UTF-8 for its
filesystem, and Swish-e converts UTF-8 to Latin1 (ISO-8859-1) where possible.
And who knows what samba does wrt to converting (or not) filenames from windows
fs encoding to the destination Linux fs encoding.
http://j3e.de/linux/convmv/man/#how_to_repair_samba_files
might address part of your issue.
The ideal is to do everything in UTF-8, since it has code points for all
characters and is ASCII compatible. But (as is oft repeated here) Swish-e
doesn't yet handle UTF-8 well. In the meantime, I'd suggest standardizing on
Latin1, since that seems like the least evil compromise. Convert your filenames
with convmv to Latin1, then index with Swish-e, and then your GUI will need to
map between Latin1 and the windows encoding (CP1252?) if retrieving from the
Windows fs (instead of from Samba).
--
Peter Karman . http://peknet.com/ . peter(at)not-real.peknet.com
_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Wed Apr 4 09:13:28 2007