Quick quiz! Given the following:
void f(unsigned int); void f(int); void f(char);
Which overload gets called by the following?
char x = 1; char y = 2; f(x + y);
Alternatives:
f(unsigned int)
f(int)
f(char)
- No-one knows the type of
char + char
If you answered 4), congratulations! And if you answered 2), maybe you tried the code on your own computer? Most people will get f(int)
when they try this code, but this is actually not specified by the standard. The only thing we know for sure is that it’s not 3), f(char)
!
Let’s have a look at what’s going on:
Before being passed to operator +
, the operands (x
and y
) go through a conversion. [expr.add]§8.7¶1:
The usual arithmetic conversions are performed for operands of arithmetic or enumeration type.
What are “the usual arithmetic conversions”?
[expr]§8¶11:
Many binary operators that expect operands of arithmetic or enumeration type cause conversions and yield result types in a similar way. The purpose is to yield a common type, which is also the type of the result. This pattern is called the usual arithmetic conversions, which are defined as follows:
– [a bunch of rules for floats, enums etc]
– Otherwise, the integral promotions (7.6) shall be performed on both operands
So both char
s go through integral promotions. Those are defined in [conv.prom]§7.6¶1:
A prvalue of an integer type other than
bool
,char16_t
,char32_t
, orwchar_t
whose integer conversion rank (7.15) is less than the rank ofint
can be converted to a prvalue of typeint
if int can represent all the values of the source type; otherwise, the source prvalue can be converted to a prvalue of typeunsigned int
.
So a char
gets converted to an int
if int
can fit all possible values of a char
. If not, they get converted to unsigned int
. But any char
should fit in an int
, right? As it turns out, that’s not necessarily the case.
First, int
could actually be the same size as char
. [basic.fundamental]§6.9.1¶2:
There are five standard signed integer types : “signed char”, “short int”, “int”, “long int”, and “long long int”. In this list, each type provides at least as much storage as those preceding it in the list.
Note that it says “at least as much storage”, it doesn’t have to be more. So for instance you could have an sixteen bit system where both char
and int
are sixteen bits.
Second, char
can be either signed or unsigned, it’s up to the implementation: [basic.fundamental]§6.9.1¶1:
It is implementation-defined whether a char object can hold negative values.
int
is signed, so if char
is also signed, all possible values of char
will fit in an int
. However, if char
is unsigned, and int
and char
is the same size, char
can actually hold larger values than int
!
Let’s see an example. If char
and int
are both sixteen bits, int
(which is always signed) can hold [-32768, 32767]
. If char
is signed, it can also hold [-32768, 32767]
, and any char
fits in an int
. However, if char
is unsigned, it can hold [0,65535]
, half of which fall outside the range of int
!
In the former case, char
s get promoted to int
s, but in the latter case, char
s get promoted to unsigned int
s before being summed.
So in practice, most systems will call f(int)
, but some might call f(unsigned int)
, and they would both be confirming to the standard.
If you enjoyed this post, you can subscribe to my blog, or follow me on Twitter.
the same is for short
Are you sure? Short is guaranteed to be signed, so how can this happen to short?
I have tried it for uint8_t and both g++ and clang++ and it both passed to int version of function.
Yeah I’m pretty sure this could not happen to short.
However, as for anything related to undefined, unspecified or implementation defined behaviour, trying it out doesn’t really help. If you try this out with char in g++ and clang++, they would both pick the int version too, and you wouldn’t notice the issue.
IMHO char + char should be an illegal expression. It doesn’t hold meaning as char it self is not a numeric type. It is meant to hold a character. char and signed char are not the same type. It’s like asking what’s ‘a’ +’b’. The logical answer if you’d ask me is ‘ab’.
I’m on my phone so I can’t really try it, but what result do you get when you do this with signed char (or int8_t, which is defined to signed char)?
char + char is a fine expression with meaning – char is a numeric type, but even if you wanted to treat it strictly as a singular character, allowing (char)(‘0’ + some_num) where some_num can be any numeric type including a char is convenient for converting a singular digit number into an ASCII character number. There are a number of other uses for taking a local digit and putting it in the bounds of ASCII printable characters.
Interesting take. But the `char + some_num` example makes sense. What about `char – char`, would you allow that? It could be useful to allow checking the “distance” between two characters?
If you use `signed char` or `unsigned char`, you’re guaranteed to get `signed int` and `unsigned int` respectively.
(Btw., don’t try these things out to find the answer, you’ll get bitten by implementation defined behavior.)
I think that char – char should definitely be legal. The distance between characters is well defined. Same for char + numeric. Both logically makes sense. I think a good analogy might be floors in a building. Asking what’s the distance between the second and seventh floor makes sense, or what’s two floors above the 4th. But the question ‘what’s the 5th floor plus the 6th floor’ doesn’t make sense.
BTW, this is usually a non issue as the compiler is happy to implicitly cast between char and signed char. No try it with std::vector and std::vector.
**Affine space** describes these kind of relationships in mathematics. Eg position and disposition in n dimension, or count and offset in buffers, even timestamp and duration.
Yes, this is also nicely done in e.g. `std::chrono`. I think using proper affine space types for fundamentals like `char` is unlikely though, so we need to figure out what’s best to do for those.
If we want to allow taking the difference between two chars, but not summing them, I think we’ll complicate the fundamental type system too much. So I think it’s better to just allow normal arithmetic for `char`.
FYI: Herb Sutter talks about this issue and this article on the current cppcast: http://cppcast.com/2019/05/herb-sutter/
Seems like me and him are in agreement
Yes, I agree too. If we were to design the language again from scratch, with proper unicode support and a unicode library to back it, `char` would mean something else entirely than it does now, and should not be an arithmetic type. However, since we only have a very primitive character type, I think it makes sense to be able to do certain arithmetic operations on it.
Thanks for the tip, I’ll check it out!
IMHO, you should avoid using C or C++. Try Pascal or Common Lisp.
Consider this:
char i = 0;
static_assert(std::is_same_v); // OK
static_assert(std::is_same_v); // ERROR: +/- applied to char gives int
It seems that simple application of a unary operator changes the type of an expression. Probably, backward compatibility with C makes this hole in type type system legal.
Code was clipped. Copy here: http://coliru.stacked-crooked.com/a/3f857fb27eedd96d
Integral promotion is performed for unary plus, in the same way it is done for the additive operator described in the blog post.
What exactly do you mean by a “hole in the type system”?
Decent type system shall not allow constructions like Char + Int implicitly, IMHO. At least, compiler could emit a warning if one is going to be done behind the scenes, and the results may be wrong/surprising.
I agree, the C / C++ type systems are too eager to do implicit conversions.
I think it’s not just eagerness. some things become logically weird pretty fast. If you want it to be more complete you should be able to cast containers.
if this can work:
signed char sc = 123;
char c = sc;
then I think this should work:
std::vector v_sc { 1, 2, 3 };
std::vector v_c = v_sc;
If it’s implicit it should be implicit all the way through
I’m not sure I agree.
For widening conversions: Implicitly converting e.g. a short to an int is fine, there’s no overhead and you won’t loose data. However, if you want to allow implicit conversions from vector to vector, you’ll implicitly allocate memory for an entirely new vector, and copy and convert all the data. I think that’s bad.
For narrowing conversions: I don’t think these should be implicit in the first place. But even if they are, allowing this for containers would be even worse, since you get allocation and copying in addition to the narrowing.
It’s no doubt a weird example, where you can just use the same vector and things will accidentally work. You examples should not be allowed by default. Maybe we’re lacking verbosity in describing the conversions .
Yes this particular example would work if you just treated the `vector` as a `vector` instead of constructing a new vector.
But this is actually very different from implicit conversions! When you implicitly convert a `signed char` to a `char`, the new `char` actually has the type `char`. An actual type conversion is performed and a new object is created. If you want to just re-use the object and pretend it’s of another type, that’s something else which I don’t see how we could allow.
Nitpick: Last paragraph mentions that functions will print ‘i’ or ‘u’, but there’s no code for bodies of the functions.
Thanks for the heads up! It’s a leftover from an earlier version where the functions had definitions too. Fixed now.
I don’t know specifically about C++, but based on K&R, my response was int + int. In fact, a2i — ascii to integer — relies on being able to subtract two chars. Since C++ honors C syntax, I’d expect this reasoning to work in C. Also see the discussion in K&R as to whether chars are signed or unsigned — section 2.7, page 43 last paragraph.