wrong interpratation of UTF-8 character if on a 8000 boundary
ilias iliadis
simsonbike-bugs at yahoo.gr
Mon Jul 25 12:20:56 UTC 2011
First versions (from two different PCs)
mc -V
GNU Midnight Commander 4.7.0
Virtual File System: tarfs, extfs, cpiofs, ftpfs, fish
With builtin Editor
Using system-installed S-Lang library with terminfo database
With subshell support as default
With support for background operations
With mouse support on xterm and Linux console
With internationalization support
With multiple codepages support
Data types: char 8 int 32 long 64 void * 64 off_t 64 ecs_char 8
mc -V
GNU Midnight Commander 4.7.0.9
Virtual File System: tarfs, extfs, cpiofs, ftpfs, fish, undelfs
With builtin Editor
Using system-installed S-Lang library with terminfo database
With subshell support as default
Με υποστήριξη εργασιών παρασκηνίου
With mouse support on xterm and Linux console
Με υποστήριξη συμβάντων X11
With internationalization support
With multiple codepages support
Data types: char 8 int 32 long 32 void * 32 off_t 64 ecs_char 8
Problem:
In a text file if a UTF-8 character (bigger than FF, such as:GREEK SMALL LETTER TAU 0xCF 0x84) lays on a 0x8000 boundary is misinterpreted as two different characters and is displayed wrong in view mode (F3) as two points (..)
Attached is a "splited" file from el.wiktionary
You can see that there is a difference if you open attached unzipped txt file in gedit and in mc. In mc you see ".." instead of "τ" at position 0x8000
Thanks
-------------- next part --------------
A non-text attachment was scrubbed...
Name: testmc.zip
Type: application/zip
Size: 5197 bytes
Desc: not available
URL: <http://lists.midnight-commander.org/pipermail/mc-devel/attachments/20110725/39d3c70b/attachment.zip>
More information about the mc-devel
mailing list