mc and utf-8 again but different
Rostislav Beneš
xbenes5 at fi.muni.cz
Mon Nov 12 02:14:10 UTC 2007
Hello,
in April I announced, that I chose mc and UTF-8 as my bachelor's work. And
now I will present my results.
I started with utf8-patch and tried add support for changing encoding in
vfs. I added new prefix "#enc:" for do it. First I implemented this as a
vfs_class, but there was problems with links. Then I edited directly mc_*
function of vfs and it works.
I make decision, that will be nice, if whole mc works in utf-8 everywhere.
Only one kind of functions will be needed. But I did not mind, that
localization is not always in utf-8. I created slightly mad functions,
that convert localization in utf-8. Now mc supported all localization (...
that I tested). Only regular expressions was broken in non-utf8 encodings.
I continued with editing view. In view I changed reading, displaying and
caching functions.
(http://www.fi.muni.cz/~xbenes5/projects/mc/mc-test.tar.gz, last version
of utf-8 always version)
But when I swad my edits in mc, I changed my mind. I rejected utf-8
everywhere idea and checked out the newest version of mc. I designed api
for strings (I assumed it before, so no big problem) and make variant for
ascii, 8bits encodings and utf-8 (and possibly other encodings, that
support backward reading). I imported good ideas from previous attempt and
created final set of 30 patches. Each patch has small comment in
mc-utf8.txt. Utf-8-patch don't occur in my pathces.
separately patches -
http://www.fi.muni.cz/~xbenes5/projects/mc/mc-utf8.tar.gz
all together in one patch -
http://www.fi.muni.cz/~xbenes5/projects/mc/mc-utf8-all.tar.gz
and applied to cvs version of mc -
http://www.fi.muni.cz/~xbenes5/projects/mc/mc-complete.tar.gz
Problems:
invalid strings - I chose a defensive way, no invalid strings are loaded
in mc. Only panels can handle invalid file names. I'm not sure, that I
found every place, where invalid strings can appear. API functions like
str_next_char, str_prev_char, str_length do not support invalid strings.
Invalid strings support str_term_* function (formation for drawing on
screen).
message_handler shall support multibytes characters. Now WInput must have
self buffer for multibytes characters and it is not ideal. I don't modify
message handler, because it is a huge change and is not needed, but it
will be better. (Possibility of multibytes hot-keys, too, but I don't
know, if someone will use this.)
I hope, that will work for someone, I'm trying used it instead of default
distribution's mc now.
and now place for questions
Rostilav Beneš
(sorry about keying mistakes and etc, i should go sleep)
More information about the mc-devel
mailing list