zh_TW.Big5.po: illegal control sequence

Pablo Saratxaga pablo at mandrakesoft.com
Tue May 22 11:51:03 UTC 2001


Kaixo!

On Tue, May 22, 2001 at 02:33:28AM -0400, Pavel Roskin wrote:

> > > It's one more restriction on the build system for CVS users.
> > is it such a hard restriction?
> 
> I think it is. I don't know about other distributions, but RedHat 7.1
> ships 0.10.35.

Yes, but Red Hat 7.1 already shipped with Gnome 1.4.
I mean, what is currently on CVS should be thought for the future.
Note also that the compiled mo format is the same; that is, the difference
is only forbuilding the package, there is no difference for the user,
a package built with gettext 0.10.35 can be used on a Red Hat 7.1 without
problems.

> I personally would never ask people to upgrade some software to make MC
> compile because of some issues in a translation that most people on this
> list are highly unlikely to use. But since you took this responsibility,
> let's go ahead with the new gettext.

in fact the new gettext is correct: a file claiming to be big5 uses
big5 text.
It is the old behaviour that was wrong (a file claiming to be big5 was not
in big5 encoding).
It is a problem because both are mutually exclusive.

THe new gettext solves that very long standing problem, and that is a good
thing, and it should compile without problem on systems that compiled
the previous versions of gettext.

> I tried gettext 0.10.36 before, and it would report an error on the Tamil
> translation. Now it's just a warning, which is good.

The list of "good" encodings is hardcoded in the source; imho it could be
expande a bit (I did a small patch in the version used in Mandrake, to avoid
seeing those annoying messages).
A command option to make them silent will be nice too.

> I also tested the charset translation with 0.10.37. MC works fine with
> LANG=ru_RU.ISO8859-5 on rxvt with an iso-8859-5 font. But the hints are
> displayed incorrectly, because they are not using gettext.

whzt hints, the icons ones? That is done in the *.desktop files, and
needs a different handling.
Imho it should be done like this (I already successfully used that method
with other prgrams):

- change all the files to use utf-8 encoding
- have the program assume the files are in utf-8 and try a conversion from
  utf-8 to the user charset, if the conversion works, display the result;
  if it fails, assume the string was in the user encoding and dispaly as is
  (that is the current behaviour). 

when all is converted to utf-8 there won't be any problem anymore.

The problem is that those files don't give any information on the used
encoding (po files does, in charset= line); so it is not possible to
automatically convert encodings.

Using utf-8 has advantages for that:
- it has very recognizable pattern; that is you can know if there is utf-8
  or not
- once you know it is utf-8, you know the source encoding and can convert
- it is an encoding that can work as source encoding for all languages

> Unfortunately, gmc doesn't work correctly with LANG=ru_RU.ISO8859-5. Maybe
> it's fixed in the head branch of Gtk+.

It may be an X11 or gtkrc problem.
when you launch LANG=ru_RU.ISO8859-5 gmc in an xterm, is there some error
messages?
what gives a 'grep ru_RU /usr/X11R6/lib/X11/locale/locale*'
what is the contents of /etc/gtk/gtkrc.ru* and /usr/share/gtk/gtkrc.ru* ?

(note this is not related to the version of gettext used to compile)

>>> I'd rather be invalid with 0.10.37 than illegal with 0.10.35 <grin>
>>
>> it could be converted to utf-8.
> 
> Only if it's convenient for our Taiwanese colleagues.
> 
> > the problem will then be: are there some systems that have a gettext()
> > implementation that cna't convert between po encoding and user encoding
> > (eg: po file in utf-8, runtime display in big5) ?
> 
> The "--with-included-gettext" option would help, provided that we upgrade
> the "intl" directory to 0.10.37. However, I'll only do it after GMC is
> branched off. I believe that all GNOME programs should use the same
> gettext and that the transition should be performed for all GNOME
> software in the same time.

So, what is better you think, use big5 encoding, and requiring new gettext;
or using utf-8, and requiring on some systems to use internal gettext
(but on those systems, don't they also need GNU gettext to compile the po
files?). 

-- 
Ki ça vos våye bén,
Pablo Saratxaga

http://www.srtxg.easynet.be/		PGP Key available, key ID: 0x8F0E4975




More information about the mc-devel mailing list