mc and utf-8 again but different

Rostislav Beneš xbenes5 at fi.muni.cz
Mon Nov 12 02:14:10 UTC 2007


Hello,

in April I announced, that I chose mc and UTF-8 as my bachelor's work. And  
now I will present my results.

I started with utf8-patch and tried add support for changing encoding in  
vfs. I added new prefix "#enc:" for do it. First I implemented this as a  
vfs_class, but there was problems with links. Then I edited directly mc_*  
function of vfs and it works.

I make decision, that will be nice, if whole mc works in utf-8 everywhere.  
Only one kind of functions will be needed. But I did not mind, that  
localization is not always in utf-8. I created slightly mad functions,  
that convert localization in utf-8. Now mc supported all localization (...  
that I tested). Only regular expressions was broken in non-utf8 encodings.  
I continued with editing view. In view I changed reading, displaying and  
caching functions.
(http://www.fi.muni.cz/~xbenes5/projects/mc/mc-test.tar.gz, last version  
of utf-8 always version)

But when I swad my edits in mc, I changed my mind. I rejected utf-8  
everywhere idea and checked out the newest version of mc. I designed api  
for strings (I assumed it before, so no big problem) and make variant for  
ascii, 8bits encodings and utf-8 (and possibly other encodings, that  
support backward reading). I imported good ideas from previous attempt and  
created final set of 30 patches. Each patch has small comment in  
mc-utf8.txt. Utf-8-patch don't occur in my pathces.

separately patches -  
http://www.fi.muni.cz/~xbenes5/projects/mc/mc-utf8.tar.gz
all together in one patch -  
http://www.fi.muni.cz/~xbenes5/projects/mc/mc-utf8-all.tar.gz
and applied to cvs version of mc -  
http://www.fi.muni.cz/~xbenes5/projects/mc/mc-complete.tar.gz

Problems:

invalid strings - I chose a defensive way, no invalid strings are loaded  
in mc. Only panels can handle invalid file names. I'm not sure, that I  
found every place, where invalid strings can appear. API functions like  
str_next_char, str_prev_char, str_length do not support invalid strings.  
Invalid strings support str_term_* function (formation for drawing on  
screen).

message_handler shall support multibytes characters. Now WInput must have  
self buffer for multibytes characters and it is not ideal. I don't modify  
message handler, because it is a huge change and is not needed, but it  
will be better. (Possibility of multibytes hot-keys, too, but I don't  
know, if someone will use this.)

I hope, that will work for someone, I'm trying used it instead of default  
distribution's mc now.

and now place for questions

Rostilav Beneš

(sorry about keying mistakes and etc, i should go sleep)



More information about the mc-devel mailing list