Terminology concerning strings
Leonard den Ottolander
leonard at den.ottolander.nl
Wed Apr 6 10:36:26 UTC 2005
Hi Egmont,
On Mon, 2005-04-04 at 14:36, Koblinger Egmont wrote:
> On Mon, Apr 04, 2005 at 11:35:44AM +0200, Roland Illig wrote:
>
> > * the _size_ of a string (as well as for other objects) is the number of
> > bytes that is allocated for it. For arrays, it is the number of
> > entries of the array. For strings it is at least _length_ + 1.
> >
> > * the _length_ of a string is the number of characters in it, excluding
> > the terminating '\0'.
> It seems to me that this terminology is not yet multibyte-aware. Since UTF-8
> becomes an everyday issue and AFAIR is planned for mainstream mc 4.7.0, IMHO
> it is very important to create a clear terminology for this even if it's not
> yet officially implemented now.
It seems you haven't read Roland's post very well. He clearly
differentiates between size (raw number of bytes) and length (number of
characters represented on the screen). From discussions with him I know
he writes this post explicitly with multibyte charsets in mind. "ecs" in
ecssup.{c,h} stands for "extended charset".
Or am I missing your point?
Leonard.
--
mount -t life -o ro /dev/dna /genetic/research
More information about the mc-devel
mailing list