No-one knows the type of char + char

Anders Schau Knatten Uncategorized May 24, 2019June 2, 2019 2 Minutes

Quick quiz! Given the following:

void f(unsigned int);
void f(int);
void f(char);

Which overload gets called by the following?

char x = 1;
char y = 2;
f(x + y);

Alternatives:

f(unsigned int)
f(int)
f(char)
No-one knows the type of char + char

If you answered 4), congratulations! And if you answered 2), maybe you tried the code on your own computer? Most people will get f(int) when they try this code, but this is actually not specified by the standard. The only thing we know for sure is that it’s not 3), f(char)!

Let’s have a look at what’s going on:

Before being passed to operator +, the operands (x and y) go through a conversion. [expr.add]§8.7¶1:

The usual arithmetic conversions are performed for operands of arithmetic or enumeration type.

What are “the usual arithmetic conversions”?

[expr]§8¶11:

Many binary operators that expect operands of arithmetic or enumeration type cause conversions and yield result types in a similar way. The purpose is to yield a common type, which is also the type of the result. This pattern is called the usual arithmetic conversions, which are defined as follows:
– [a bunch of rules for floats, enums etc]
– Otherwise, the integral promotions (7.6) shall be performed on both operands

So both chars go through integral promotions. Those are defined in [conv.prom]§7.6¶1:

A prvalue of an integer type other than bool, char16_t, char32_t, or wchar_t whose integer conversion rank (7.15) is less than the rank of int can be converted to a prvalue of type int if int can represent all the values of the source type; otherwise, the source prvalue can be converted to a prvalue of type unsigned int.

So a char gets converted to an int if int can fit all possible values of a char. If not, they get converted to unsigned int. But any char should fit in an int, right? As it turns out, that’s not necessarily the case.

First, int could actually be the same size as char. [basic.fundamental]§6.9.1¶2:

There are five standard signed integer types : “signed char”, “short int”, “int”, “long int”, and “long long int”. In this list, each type provides at least as much storage as those preceding it in the list.

Note that it says “at least as much storage”, it doesn’t have to be more. So for instance you could have an sixteen bit system where both char and int are sixteen bits.

Second, char can be either signed or unsigned, it’s up to the implementation: [basic.fundamental]§6.9.1¶1:

It is implementation-defined whether a char object can hold negative values.

int is signed, so if char is also signed, all possible values of char will fit in an int. However, if char is unsigned, and int and char is the same size, char can actually hold larger values than int!

Let’s see an example. If char and int are both sixteen bits, int (which is always signed) can hold [-32768, 32767]. If char is signed, it can also hold [-32768, 32767], and any char fits in an int. However, if char is unsigned, it can hold [0,65535], half of which fall outside the range of int!

In the former case, chars get promoted to ints, but in the latter case, chars get promoted to unsigned ints before being summed.

So in practice, most systems will call f(int), but some might call f(unsigned int), and they would both be confirming to the standard.

If you enjoyed this post, you can subscribe to my blog, or follow me on Twitter.

Published by Anders Schau Knatten

Computer Programmer since 1995. View all posts by Anders Schau Knatten

Published May 24, 2019June 2, 2019

35 thoughts on “No-one knows the type of char + char”

Anonymous says:

May 24, 2019 at 15:54

the same is for short

Reply
1. Anders Schau Knatten says:
  
  May 24, 2019 at 18:58
  
  Are you sure? Short is guaranteed to be signed, so how can this happen to short?
  
  Reply
  1. Alex Chernov says:
    
    July 12, 2019 at 19:30
    
    I have tried it for uint8_t and both g++ and clang++ and it both passed to int version of function.
    
    Reply
    1. Anders Schau Knatten says:
      
      July 13, 2019 at 08:51
      
      Yeah I’m pretty sure this could not happen to short.
      
      However, as for anything related to undefined, unspecified or implementation defined behaviour, trying it out doesn’t really help. If you try this out with char in g++ and clang++, they would both pick the int version too, and you wouldn’t notice the issue.
      
      Reply
Uriel says:

May 24, 2019 at 22:53

IMHO char + char should be an illegal expression. It doesn’t hold meaning as char it self is not a numeric type. It is meant to hold a character. char and signed char are not the same type. It’s like asking what’s ‘a’ +’b’. The logical answer if you’d ask me is ‘ab’.
I’m on my phone so I can’t really try it, but what result do you get when you do this with signed char (or int8_t, which is defined to signed char)?

Reply
1. Anonymous says:
  
  May 25, 2019 at 15:35
  
  char + char is a fine expression with meaning – char is a numeric type, but even if you wanted to treat it strictly as a singular character, allowing (char)(‘0’ + some_num) where some_num can be any numeric type including a char is convenient for converting a singular digit number into an ASCII character number. There are a number of other uses for taking a local digit and putting it in the bounds of ASCII printable characters.
  
  Reply
2. Anders Schau Knatten says:
  
  May 25, 2019 at 19:59
  
  Interesting take. But the `char + some_num` example makes sense. What about `char – char`, would you allow that? It could be useful to allow checking the “distance” between two characters?
  
  If you use `signed char` or `unsigned char`, you’re guaranteed to get `signed int` and `unsigned int` respectively.
  
  (Btw., don’t try these things out to find the answer, you’ll get bitten by implementation defined behavior.)
  
  Reply
  1. Uriel says:
    
    May 26, 2019 at 04:18
    
    I think that char – char should definitely be legal. The distance between characters is well defined. Same for char + numeric. Both logically makes sense. I think a good analogy might be floors in a building. Asking what’s the distance between the second and seventh floor makes sense, or what’s two floors above the 4th. But the question ‘what’s the 5th floor plus the 6th floor’ doesn’t make sense.
    
    BTW, this is usually a non issue as the compiler is happy to implicitly cast between char and signed char. No try it with std::vector and std::vector.
    
    Reply
    1. Abel says:
      
      May 27, 2019 at 11:50
      
      **Affine space** describes these kind of relationships in mathematics. Eg position and disposition in n dimension, or count and offset in buffers, even timestamp and duration.
      
      Reply
      1. Anders Schau Knatten says:
        
        May 27, 2019 at 13:09
        
        Yes, this is also nicely done in e.g. `std::chrono`. I think using proper affine space types for fundamentals like `char` is unlikely though, so we need to figure out what’s best to do for those.
        
        If we want to allow taking the difference between two chars, but not summing them, I think we’ll complicate the fundamental type system too much. So I think it’s better to just allow normal arithmetic for `char`.
        
        Reply
3. Emile says:
  
  May 31, 2019 at 09:00
  
  FYI: Herb Sutter talks about this issue and this article on the current cppcast: http://cppcast.com/2019/05/herb-sutter/
  
  Reply
  1. Uriel says:
    
    May 31, 2019 at 17:42
    
    Seems like me and him are in agreement
    
    Reply
    1. Anders Schau Knatten says:
      
      June 5, 2019 at 14:04
      
      Yes, I agree too. If we were to design the language again from scratch, with proper unicode support and a unicode library to back it, `char` would mean something else entirely than it does now, and should not be an arithmetic type. However, since we only have a very primitive character type, I think it makes sense to be able to do certain arithmetic operations on it.
      
      Reply
  2. Anders Schau Knatten says:
    
    June 2, 2019 at 15:24
    
    Thanks for the tip, I’ll check it out!
    
    Reply
4. informatimago says:
  
  October 10, 2019 at 19:08
  
  IMHO, you should avoid using C or C++. Try Pascal or Common Lisp.
  
  Reply
fj says:

May 29, 2019 at 10:43

Consider this:

char i = 0;
static_assert(std::is_same_v); // OK
static_assert(std::is_same_v); // ERROR: +/- applied to char gives int

It seems that simple application of a unary operator changes the type of an expression. Probably, backward compatibility with C makes this hole in type type system legal.

Reply
1. fj says:
  
  May 29, 2019 at 10:44
  
  Code was clipped. Copy here: http://coliru.stacked-crooked.com/a/3f857fb27eedd96d
  
  Reply
  1. Anders Schau Knatten says:
    
    May 29, 2019 at 15:18
    
    Integral promotion is performed for unary plus, in the same way it is done for the additive operator described in the blog post.
    
    What exactly do you mean by a “hole in the type system”?
    
    Reply
    1. fj says:
      
      May 31, 2019 at 11:51
      
      Decent type system shall not allow constructions like Char + Int implicitly, IMHO. At least, compiler could emit a warning if one is going to be done behind the scenes, and the results may be wrong/surprising.
      
      Reply
      1. Anders Schau Knatten says:
        
        June 2, 2019 at 15:25
        
        I agree, the C / C++ type systems are too eager to do implicit conversions.
        
        Reply
        
        Uriel says:
        
        June 4, 2019 at 16:06
        
        I think it’s not just eagerness. some things become logically weird pretty fast. If you want it to be more complete you should be able to cast containers.
        if this can work:
        
        signed char sc = 123;
        char c = sc;
        
        then I think this should work:
        
        std::vector v_sc { 1, 2, 3 };
        std::vector v_c = v_sc;
        
        If it’s implicit it should be implicit all the way through
        
        Reply
        
        Anders Schau Knatten says:
        
        June 4, 2019 at 19:27
        
        I’m not sure I agree.
        
        For widening conversions: Implicitly converting e.g. a short to an int is fine, there’s no overhead and you won’t loose data. However, if you want to allow implicit conversions from vector to vector, you’ll implicitly allocate memory for an entirely new vector, and copy and convert all the data. I think that’s bad.
        
        For narrowing conversions: I don’t think these should be implicit in the first place. But even if they are, allowing this for containers would be even worse, since you get allocation and copying in addition to the narrowing.
        
        Uriel says:
        
        June 4, 2019 at 20:14
        
        It’s no doubt a weird example, where you can just use the same vector and things will accidentally work. You examples should not be allowed by default. Maybe we’re lacking verbosity in describing the conversions .
        
        Anders Schau Knatten says:
        
        June 5, 2019 at 14:14
        
        Yes this particular example would work if you just treated the `vector` as a `vector` instead of constructing a new vector.
        
        But this is actually very different from implicit conversions! When you implicitly convert a `signed char` to a `char`, the new `char` actually has the type `char`. An actual type conversion is performed and a new object is created. If you want to just re-use the object and pretend it’s of another type, that’s something else which I don’t see how we could allow.
Artem Tokmakov says:

May 31, 2019 at 19:37

Nitpick: Last paragraph mentions that functions will print ‘i’ or ‘u’, but there’s no code for bodies of the functions.

Reply
1. Anders Schau Knatten says:
  
  June 2, 2019 at 15:27
  
  Thanks for the heads up! It’s a leftover from an earlier version where the functions had definitions too. Fixed now.
  
  Reply
Pingback: New top story on Hacker News: No-one knows the type of char + char – protipsss
Pingback: New top story on Hacker News: No-one knows the type of char + char – News about world
Pingback: New top story on Hacker News: No-one knows the type of char + char – Hckr News
Pingback: New top story on Hacker News: No-one knows the type of char + char – Golden News
Pingback: New top story on Hacker News: No-one knows the type of char + char – Latest news
Pingback: New top story on Hacker News: No-one knows the type of char + char – World Best News
Pingback: New top story on Hacker News: No-one knows the type of char + char – Outside The Know
Pingback: C++ Annotated: March–October 2019 | CLion Blog
Guido says:

May 15, 2020 at 20:08

I don’t know specifically about C++, but based on K&R, my response was int + int. In fact, a2i — ascii to integer — relies on being able to subtract two chars. Since C++ honors C syntax, I’d expect this reasoning to work in C. Also see the discussion in K&R as to whether chars are signed or unsigned — section 2.7, page 43 last paragraph.

Reply

	Anonymous on Why we probably shouldn’…
	Anonymous on Why we probably shouldn’…
	Anders Schau Knatten on Why we probably shouldn’…
	Anders Schau Knatten on Why we probably shouldn’…
	Anonymous on Why we probably shouldn’…

Share this:

Related

Published by Anders Schau Knatten

35 thoughts on “No-one knows the type of char + char”

Leave a comment Cancel reply