Midnight Commander within console in UTF-8 mode

Andy Rysin arysin at yahoo.com
Wed Oct 24 03:45:44 UTC 2001


Hello Pavel!

--- Pavel Roskin <proski at gnu.org> wrote:
...
> My guess is that MC is most likely not responsible for that.  You
> could
> check how the screen library (included S-Lang, system installed
> S-Lang,
> ncurses) affects this behavior.  Also you could look into
> "Options->Display bits..." in MC and make sure that you have "Full
> 8 bits
> output" enabled.
8 bits are defenitely turned on, and I'll try to play with those
libraries soon. 

> You could also describe your environment better so that I could try
> it for
> you.  I cannot even get "ls -l" to work properly.  What version of
...
I am using Mandrake 8.1:
linux-2.4.12, glibc-2.2.4, KDE 2.2.1, XFree 4.1.0, mc-4.5.55

two commands to try:
$ export LC_ALL=ru_RU.UTF-8
$ echo -en "\\033%G"

First one forces console to display dates in russian with UTF-8
encoding (BTW "LANG" won't do that), and second will force console
program to display UTF-8 output correctly (you'll need unicode font).
Then if you type 'ls -l' all the dates will be displayed correctly in
russian. Again if you do "more utf8.txt" where utf8.txt is text file
in UTF-8 it will be displayed correctly. With xterm you should use
option "-u8"
. From `man xterm`: (       -u8     This option sets the encodingMode
resource as ``utf8'', which makes xterm  interpret
               incoming  data  as UTF-8.  You will need a Unicode
font.  This mode is default under
               UTF-8 locales.  This sets wideChars as a side-effect. 
Note that  +u8  is  obsolete.
               See -8, -en, and -lc also.
)

> The right solution would be using mbswidth() (from gettext) instead
> of
> strlen() to calculate the lenth of the strings on the screen.
> 
> There are also places in MC where is splits strings at a certain
> point.  
> It should be ensured that the split is only done at the multibyte
> character boundaries.  Implementation of name_trunc() would be very
> non-trivial.  The worst thing is that it can affect the
> performance.
That's a real conversation! :) 
name_trunc() might get more complex but not hopeless. UTF-8 is stable
encoding, every byte which is not the beginning of the symbol has got
8th bit set. So if we break on it, just go couple of bytes back to
find the beginning.
As to performance, in the worst case average length of the panel is ~
300-400 files which would not be that hard for Pentium to make couple
more instructions. Escpecially if we take to account that visual part
is not more than 30-40 file names.

> Either you or somebody else fixes MC or you should make some
> workarounds.  
I just give the idea and some initial data, if there's nobody to get
involved with this now, in some time I'll get back and try to put my
hands on it (I'm afraid though that at that time computers won't be
something we see them now :)). 
I just thought that it'd be faster if someone with his hands on MC
already sees it helpful and implement it.
...
> This will help you "survive" until MC supports multibyte
> characters.
I can live with single-byte encoding locales now, it's just that we
have to move on...

Take care,
Andriy 

__________________________________________________
Do You Yahoo!?
Make a great connection at Yahoo! Personals.
http://personals.yahoo.com



More information about the mc-devel mailing list