Request for discussion - how to make MC unicode capable
Vladimir Nadvornik
nadvornik at suse.cz
Mon Feb 26 12:32:27 UTC 2007
On Sunday 25 February 2007 14:41, Leonard den Ottolander wrote:
> Hello Pavel,
>
> On Sat, 2007-02-24 at 14:57 +0200, Pavel Tsekov wrote:
> > I'd like to initiate a discussion on how to make MC
> > unicode deal with multibyte character sets.
> >
The current utf-8 patches are based on utf-8 support in glibc.
I don't know if utf-8 is needed on other systems.
>
> Just a few thoughts:
>
> - Because multibyte is rather more memory hungry I think the user should
> still have the option to toggle the use of an 8bit path either in the
> interface or at compile time. This means where the UTF-8 patches replace
> paths we should preferably implement two paths.
The situation with the utf-8 patches is following:
In editor the utf-8 charset is converted to wchar. This requires 4 times
more of memory, but allows to keep the code almost the same.
In the rest of mc the utf-8 charset is used directly and the memory
requirements are more or less the same as with 8bit charsets.
> - I suppose a lot of the code of the UTF-8 patch can be reused, only we
> will need to add iconv() calls in the appropriate places. libiconv is
> already expected so not much trouble with the make files there. Iconv
> should only be used for the multibyte path, not the 8bit path. Using the
> multibyte path would still enable users to translate from one 8bit
> charset to another.
> - Unsupported character substitution character should be an ini option
> (and define some defaults for all/many character sets). (I'm not sure
> question mark is supported in all character sets.)
> - Users should be able to set character set per directory (mount). Of
> course there should be a system wide default taken from the environment
> (but also overridable).
> - Copy/move dialogs should have a toggle to iconv the file name or do a
> binary name copy.
> - Maybe copy/move dialogs should also have a toggle to iconv file
> content, which could be quite usable for text files. A warning dialog on
> every copy/move (that the user explicitly has to disable) might be a
> good addition then, to help uninformed users avoiding to screw up their
> data.
>
The code in charsets.c is not compatible with utf-8 and needs to be completely
rewritten. For example, the function convert_to_display(char *str) can't be
used for converting to utf-8 where the string actually grows.
With the current utf-8 patches charsets can't be used in utf-8 locales.
--
Vladimir Nadvornik
developer
---------------------------------------------------------------------
SUSE LINUX, s. r. o. e-mail: nadvornik at suse.cz
Lihovarská 1060/12 tel:+420 284 028 967
190 00 Praha 9 fax:+420 284 028 951
Czech Republic http://www.suse.cz
More information about the mc-devel
mailing list