No-one knows the type of char + char


Quick quiz! Given the following:

void f(unsigned int);
void f(int);
void f(char);

Which overload gets called by the following?

char x = 1;
char y = 2;
f(x + y);

Alternatives:

  1. f(unsigned int)
  2. f(int)
  3. f(char)
  4. No-one knows the type of char + char

If you answered 4), congratulations! And if you answered 2), maybe you tried the code on your own computer? Most people will get f(int) when they try this code, but this is actually not specified by the standard. The only thing we know for sure is that it’s not 3), f(char)!

Let’s have a look at what’s going on:

Before being passed to operator +, the operands (x and y) go through a conversion. [expr.add]§8.7¶1:

The usual arithmetic conversions are performed for operands of arithmetic or enumeration type.

What are “the usual arithmetic conversions”?

[expr]§8¶11:

Many binary operators that expect operands of arithmetic or enumeration type cause conversions and yield result types in a similar way. The purpose is to yield a common type, which is also the type of the result. This pattern is called the usual arithmetic conversions, which are defined as follows:
– [a bunch of rules for floats, enums etc]
– Otherwise, the integral promotions (7.6) shall be performed on both operands

So both chars go through integral promotions. Those are defined in [conv.prom]§7.6¶1:

A prvalue of an integer type other than bool, char16_t, char32_t, or wchar_t whose integer conversion rank (7.15) is less than the rank of int can be converted to a prvalue of type int if int can represent all the values of the source type; otherwise, the source prvalue can be converted to a prvalue of type unsigned int.

So a char gets converted to an int if int can fit all possible values of a char. If not, they get converted to unsigned int. But any char should fit in an int, right? As it turns out, that’s not necessarily the case.

First, int could actually be the same size as char. [basic.fundamental]§6.9.1¶2:

There are five standard signed integer types : “signed char”, “short int”, “int”, “long int”, and “long long int”. In this list, each type provides at least as much storage as those preceding it in the list.

Note that it says “at least as much storage”, it doesn’t have to be more. So for instance you could have an sixteen bit system where both char and int are sixteen bits.

Second, char can be either signed or unsigned, it’s up to the implementation: [basic.fundamental]§6.9.1¶1:

It is implementation-defined whether a char object can hold negative values.

int is signed, so if char is also signed, all possible values of char will fit in an int. However, if char is unsigned, and int and char is the same size, char can actually hold larger values than int!

Let’s see an example. If char and int are both sixteen bits, int (which is always signed) can hold [-32768, 32767]. If char is signed, it can also hold [-32768, 32767], and any char fits in an int. However, if char is unsigned, it can hold [0,65535], half of which fall outside the range of int!

In the former case, chars get promoted to ints, but in the latter case, chars get promoted to unsigned ints before being summed.

So in practice, most systems will call f(int), but some might call f(unsigned int), and they would both be confirming to the standard.

If you enjoyed this post, you can subscribe to my blog, or follow me on Twitter.

25 thoughts on “No-one knows the type of char + char

      1. I have tried it for uint8_t and both g++ and clang++ and it both passed to int version of function.

        1. Yeah I’m pretty sure this could not happen to short.

          However, as for anything related to undefined, unspecified or implementation defined behaviour, trying it out doesn’t really help. If you try this out with char in g++ and clang++, they would both pick the int version too, and you wouldn’t notice the issue.

  1. IMHO char + char should be an illegal expression. It doesn’t hold meaning as char it self is not a numeric type. It is meant to hold a character. char and signed char are not the same type. It’s like asking what’s ‘a’ +’b’. The logical answer if you’d ask me is ‘ab’.
    I’m on my phone so I can’t really try it, but what result do you get when you do this with signed char (or int8_t, which is defined to signed char)?

    1. char + char is a fine expression with meaning – char is a numeric type, but even if you wanted to treat it strictly as a singular character, allowing (char)(‘0’ + some_num) where some_num can be any numeric type including a char is convenient for converting a singular digit number into an ASCII character number. There are a number of other uses for taking a local digit and putting it in the bounds of ASCII printable characters.

    2. Interesting take. But the `char + some_num` example makes sense. What about `char – char`, would you allow that? It could be useful to allow checking the “distance” between two characters?

      If you use `signed char` or `unsigned char`, you’re guaranteed to get `signed int` and `unsigned int` respectively.

      (Btw., don’t try these things out to find the answer, you’ll get bitten by implementation defined behavior.)

      1. I think that char – char should definitely be legal. The distance between characters is well defined. Same for char + numeric. Both logically makes sense. I think a good analogy might be floors in a building. Asking what’s the distance between the second and seventh floor makes sense, or what’s two floors above the 4th. But the question ‘what’s the 5th floor plus the 6th floor’ doesn’t make sense.

        BTW, this is usually a non issue as the compiler is happy to implicitly cast between char and signed char. No try it with std::vector and std::vector.

        1. **Affine space** describes these kind of relationships in mathematics. Eg position and disposition in n dimension, or count and offset in buffers, even timestamp and duration.

          1. Yes, this is also nicely done in e.g. `std::chrono`. I think using proper affine space types for fundamentals like `char` is unlikely though, so we need to figure out what’s best to do for those.

            If we want to allow taking the difference between two chars, but not summing them, I think we’ll complicate the fundamental type system too much. So I think it’s better to just allow normal arithmetic for `char`.

        1. Yes, I agree too. If we were to design the language again from scratch, with proper unicode support and a unicode library to back it, `char` would mean something else entirely than it does now, and should not be an arithmetic type. However, since we only have a very primitive character type, I think it makes sense to be able to do certain arithmetic operations on it.

  2. Consider this:

    char i = 0;
    static_assert(std::is_same_v); // OK
    static_assert(std::is_same_v); // ERROR: +/- applied to char gives int

    It seems that simple application of a unary operator changes the type of an expression. Probably, backward compatibility with C makes this hole in type type system legal.

      1. Integral promotion is performed for unary plus, in the same way it is done for the additive operator described in the blog post.

        What exactly do you mean by a “hole in the type system”?

        1. Decent type system shall not allow constructions like Char + Int implicitly, IMHO. At least, compiler could emit a warning if one is going to be done behind the scenes, and the results may be wrong/surprising.

            1. I think it’s not just eagerness. some things become logically weird pretty fast. If you want it to be more complete you should be able to cast containers.
              if this can work:

              signed char sc = 123;
              char c = sc;

              then I think this should work:

              std::vector v_sc { 1, 2, 3 };
              std::vector v_c = v_sc;

              If it’s implicit it should be implicit all the way through

              1. I’m not sure I agree.

                For widening conversions: Implicitly converting e.g. a short to an int is fine, there’s no overhead and you won’t loose data. However, if you want to allow implicit conversions from vector to vector, you’ll implicitly allocate memory for an entirely new vector, and copy and convert all the data. I think that’s bad.

                For narrowing conversions: I don’t think these should be implicit in the first place. But even if they are, allowing this for containers would be even worse, since you get allocation and copying in addition to the narrowing.

              2. It’s no doubt a weird example, where you can just use the same vector and things will accidentally work. You examples should not be allowed by default. Maybe we’re lacking verbosity in describing the conversions .

              3. Yes this particular example would work if you just treated the `vector` as a `vector` instead of constructing a new vector.

                But this is actually very different from implicit conversions! When you implicitly convert a `signed char` to a `char`, the new `char` actually has the type `char`. An actual type conversion is performed and a new object is created. If you want to just re-use the object and pretend it’s of another type, that’s something else which I don’t see how we could allow.

  3. Nitpick: Last paragraph mentions that functions will print ‘i’ or ‘u’, but there’s no code for bodies of the functions.

Leave a Reply to Anonymous Cancel reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s