Multicodepage patch: comments and system default codepage in mc.charsets

Pavel Roskin proski at gnu.org
Tue Jun 5 03:49:00 UTC 2001


Hello, Andrew!

Sorry for writing you in English and copying to the list, but I want to
avoid any "conspiracy" and inform the list about the current state of
affairs.

I currently have two versions on the codepage patch - one of them is old,
but it has ifdefs and can be applied cleanly. The other one is new, but is
doesn't have ifdefs and doesn't apply to the CVS source.

I started applying the old patch before I realized that it's old. I added
the necessary infrastructure to configure.in and makefiles. I also added
the new files. The only thing that remains is applying the patch to the
already existing files.

Unfortunately, I got stuck at this point. Not only did I realize that I
was applying the old version, but I also found some problems in the
implementation.

I wrote Walery privately about it but I should have written here. I
thought he would tell you about my concerns. Let me show what I mean in
your patch.

> +#ifndef HAVE_CHASET
>      { "eight_bit_clean", &eight_bit_clean },
>      { "full_eight_bits", &full_eight_bits },
> +#endif /* HAVE_CHARSET */

That's exactly the problem. The names of those variable are confusing, but
their meaning is:

eight_bit_clean - allow symbols from 160 to 255, i.e. iso-8859-1 symbols.
Symbols 128-159 should not appear on the screen. This means that you
cannot display e.g. Russian characters in cp-866, but koi8-r and win-1251
are fine, you can even recode from one to the other.

full_eight_bits - allow symbols from 128 to 255. There are still
exceptions if MC is running on xterm, but basically you can display
everything.

We have a big design flaw in MC - those settings should be specific to
terminal. Another option, force_ugly_line_drawing, which is controlled by
the "-a" switch on the command line, should actually be in the same group
and be persistent for the given terminal name.

Please note that all those options make sence even if the charset patch is
applied. You cannot describe output properties of your terminal by
specifying the encoding. If you specify that the terminal uses koi8-r one
may assume that _some_ 8-bit symbols are allowed, there is still
uncertainty whether the chars between 128 and 160 can be used in the
output.

On the other hand, recoding from koi8-r to US-ASCII is not the same as not
displaying chars in the 128-255 region. Recoding will give you English
letters in place of Russian letters. Mere disabling the 8-bit output will
give you dots in place of 128-255 characters. English words will be much
more visible in this setup - esspetially MC will behave like the "strings"
command.

I don't want to remove any options. They will be used differently, they
will be stored differently, but they make sence and should not be removed.

> +#ifdef HAVE_CODESET
> +     save_string( "Misc", "display_codepage",
> +    		  get_codepage_id( display_codepage ), profile_name );
> +#endif /* HAVE_CODESET */

I understand that you are not using terminal name either. It souldn't be
hard to do it right. Just put the settings to the same section as the key
bindings:

[terminal:xterm]
end=\e[8~
home=\e[7~
display_codepage=koi8-r

I think it should work and MC shouldn't be confused.

It's fine with me if you apply whatever patches you need for the charset
support as long as the code compiled without HAVE_CHARSET is not affected.
But please think ahead - it will save us some time.

I'mn not going to touch the charset support - I'll continue random
bugfixing and work the high-priority issues.

-- 
Regards,
Pavel Roskin





More information about the mc-devel mailing list