C++ Puzzle #1: Initialization Order


Time for another puzzle! What is the output of this program?


class Parent
{
public:
    Parent(int p=0.0) : p(p)
    {
        cout << "Parent(" << p << ")" << endl;
    }

    int p;
};

class Member
{
public:
    Member(int m=0.0) : m(m)
    {
        cout << "Member(" << m << ")" << endl;
    }

    Member& operator=(const Member& rhs)
    {
            cout << "Member is copied" << endl;
            m = rhs.m;
            return *this;
    }

    int m;
};

class Derived : public Parent
{
public:
    Derived() : foo(10), bar(foo*2)
    {
        Parent(7);
        m = Member(42);
    }

    int bar;
    int foo;
    Member m;
};


int main()
{
    Derived d;
    cout << d.p << " " << d.foo << " " << d.bar << " " << d.m.m << endl;
}

Answer: Undefined behaviour! Which means it could output whatever, crash at any point, or format your hard drive. Can you spot the source of the undefined behaviour? Hint: Look for the use of an uninitialized variable.

Did you see that the order of foo and bar in the initializer list is not in the same order as their declarations? Members are always initialized in the order of declaration, not in the order in the initialization list. This means bar is initialized before foo, using an uninitialized foo to compute its value. If you compile with all warnings enabled (always a good idea), your compiler should warn you about this. For instance, g++ tells me:

main.cpp: In constructor ‘Derived::Derived()’:
main.cpp:29:9: warning: ‘Derived::foo’ will be initialized after [-Wreorder]
main.cpp:28:9: warning:   ‘int Derived::bar’ [-Wreorder]
main.cpp:22:5: warning:   when initialized here [-Wreorder]

Next question then, what is the likely output of this program on a real compiler? On my Ubuntu 12.04 using g++ 4.6, I get:

   
Parent(0)
Member(0)
Parent(7)
Member(42)
Member is copied
0 10 8393920 42  

Lets’ walk through what happens here.

  • 1: Derived inherits from Parent. C++ guarantees that all parent objects are fully constructed before the constructor of the derived class is invoked, so the first thing that happens is that the default constructor of Parent is called. (Did I fool you with Parent(7); in the constructor though? That line will just create a local object that is never used. Had I moved it to the initializer list, it would have been used instead of the default constructor.)
  • 2: Before the body of Derived‘s constructor, all its members will be initialized. Since we don’t specifically initialize Member m in the initializer list, it is first automatically default constructed, and then re-assigned in the body of the constructor (as seen in line 5 of the output).
  • 3: On line 34 of the program, we create a local Parent object which is never used. Maybe we meant to set p to 7, but put the Parent() call in the wrong place?
  • 4-5: Now another Member is constructed, and copy assigned to m. All this re-construction and copying could have been avoided if we had moved the call Member(42) to the initializer list.
  • 6: Finally, we have a look at the resulting values of Derived‘s members:
    • p is 0, not 7 as we maybe meant it to be, as explained in 1.
    • foo is 10, hopefully as expected.
    • bar is 8393920, due to the use of the uninitialized foo, as explained earlier. This is entirely by chance, and should not be relied on! Your program might just as well crash, or do something worse.

Finally let’s clean up the program and see what happens. We reorder the declarations of foo and bar, and move all initializations to the initializer list:

ERROR: Couldn't open file: [Errno 2] No such file or directory: '/home/anders/Documents/code/blog/initialization/main2.cpp'

Now the program is well defined, and free of repeated constructors and copying. Here is the output:

Parent(7)
Member(42)
7 10 20 42

Much better! Full source here.

As usual, the code for this blog post is available on GitHub.

If you enjoyed this post, you can subscribe to my blog, or follow me on Twitter.

Undefined Behaviour — Worse Than its Reputation?


Last week I wrote about The Difference Between Unspecified and Undefined Behaviour. This week I’d like to expand a bit more on the severity of undefined behaviour. If however you have a lot of time, instead go read A Guide to Undefined Behavior in C and C++ by John Regehr of the University of Utah, and then What Every C Programmer Should Know About Undefined Behavior by Chris Lattner of the LLVM project, as they cover this material in much more depth (and a lot more words!) than I do here.

To expand on the example from last week, what is the output of this program?

int main()
{
    int array[] = {1,2,3};
    cout << array[3] << endl;
    cout << "Goodbye, cruel world!" << endl;
}

A good guess would be a random integer on one line, then “Goodbye, cruel world!” on another line. A better guess would be that anything can happen on the first line, but then “Goodbye, cruel world!” for sure is printed. The answer is however that we can’t even know that, since If any step in a program’s execution has undefined behavior, then the entire execution is without meaning. [Regehr p.1].

This fact has two implications that I want to emphasize:

1: An optimizing compiler can move the undefined operation to a different place than it is given in the source code
[Regehr p.3] gives a good example of this:

int a;

void foo (unsigned y, unsigned z)
{
  bar();
  a = y%z; //Possible divide by zero
}

What happens if we call foo(1,0)? You would think bar() gets called, and then the program crashes. The compiler is however allowed to reorder the two lines in foo(), and [Regehr p.3] indeed shows that Clang does exactly this.

What are the implications? If you are investigating a crash in your program and never see the results of bar(), you might falsely conclude that the bug in the sourcecode must be before bar() is called, or in its very beginning. To find the real bug in this case you would have to turn off optimization, or step through the program in a debugger.

2: Seemingly unrelated code can be optimized away near a possible undefined behaviour
[Lattner p.1] presents a good example:

void contains_null_check(int *P) {
  int dead = *P;
  if (P == 0)
    return;
  *P = 4;
}

What happens if P is NULL? Maybe some garbage gets stored in int dead? Maybe dereferencing P crashes the program? At least we can be sure that we will never reach the last line, *P = 4 because of the check if (P == 0). Or can we?

An optimizing compiler applies its optimizations in series, not in one omniscient operation. Imagine two optimizations acting on this code, “Redundant Null Check Elimination” and “Dead Code Elimination” (in that order).

During Redundant Null Check Elimination, the compiler figures that if P == NULL, then int dead = *P; results in undefined behaviour, and the entire execution is undefined. The compiler can basically do whatever it wants. If P != NULL however, there is no need for the if-check. So it safley optimizes it away:

void contains_null_check(int *P) {
  int dead = *P;
  //if (P == 0)
    //return;
  *P = 4;
}

During Dead Code Elimination, the compiler figures out that dead is never used, and optimizes that line away as well. This invalidates the assumption made by Redundant Null Check Elimination, but the compiler has no way of knowing this, and we end up with this:

void contains_null_check(int *P) {
  *P = 4;
}

When we wrote this piece of code, we were sure (or so we thought) that *P = 4 would never be reached when P == NULL, but the compiler (correctly) optimized away the guard we meticulously had put in place.

Concluding notes
If you thought undefined behaviour only affected the operation in which it appears, I hope I have convinced you otherwise. And if you found the topic interesting, I really recommend reading the two articles I mentioned in the beginning (A Guide to Undefined Behavior in C and C++ and What Every C Programmer Should Know About Undefined Behavior). And the morale of the story is of course to avoid undefined behaviour like the plague.

If you enjoyed this post, you can subscribe to my blog, or follow me on Twitter.

The Difference Between Unspecified and Undefined Behaviour


What is the output of this program?

int main()
{
    int array[] = {1,2,3};
    cout << array[3] << endl;
}

Answer: Noone knows!

What is the output of this program?

void f(int i, int j){}

int foo()
{
    cout << "foo ";
    return 42;
}

int bar()
{
    cout << "bar ";
    return 42;
}

int main()
{
    f(foo(), bar());
}

Answer: Noone knows!

There is a difference in the severity of uncertainty though. The first case results in undefined behaviour (because we are indexing outside of the array), whereas the second results in unspecified behaviour (because we don’t know the order in which the function arguments will be evaluated). What is the difference?

In the case of undefined behaviour, we are screwed. Anything can happen, from what you thought should happen, to the program sending threatening letters to your neighbour’s cat. Probably it will read the memory right after where the array is stored, interpret whatever garbage is there and print it, but there is no way to know this.

In the case of unspecified behaviour however, we are probably OK. The implementation is allowed to choose from a set of well-defined behaviours. In our case, there are two possibilities, calling foo() then bar(), or bar() then foo(). Note that if foo() and bar() have some side-effects that we rely on being executed in a specific order, this unspecified behaviour would still mean we have a bug in our code.

To summarize, never write code that results in undefined behaviour, and never write code that relies on unspecified behaviour.

If you enjoyed this post, you can subscribe to my blog, or follow me on Twitter.

Puzzle #0: Call Sequence


Here’s a puzzle that should highlight a couple of interesting features in C++:

What is the output of the following program?

#include <iostream>;
using namespace std;

int main();

void term() {
    cout << "term()" << endl;
    main();
}

struct Positive {
    Positive(int i) {
        set_terminate(term);
        cout << "Positive::Positive(" << i << ")" << endl;
        if (i <= 0)
            throw main;
    }
};

Positive n(-1);

int main() {
    cout << "main()" << endl;
    Positive p(1);
}

If you just want the answer, you will find it at the bottom of this post. But first, I will go through the why.

Lines 1-2 are uninteresting. Lines 4-18 contain all the magic, but I’ll come back to those, I want to go through the program in chronological order as it is being executed. The entry point for this program is not main(), but rather the definition of Positive n(-1). This variable is global, and so will be initialized before main() is called. To initialize it, its constructor is called with i = -1, and so the real action starts on line 13.

On line 13, I use set_terminate() to set a termination handler. The termination handler will be called if a thrown exception is not handled. Otherwise, this statement has no visible immediate effect. We then encounter the first printout of this program, on line 14:

Positive::Positive(-1)

Our Positive class is however designed to only handle positive numbers, and throws on line 16. It is set to throw main, but this is really just a decoy. In C++, you can throw whatever you want, I happen to throw a function pointer to main(). In particular, this does not mean that main() is called.

The exception thrown on line 16 is never handled, and so the builtin terminate() calls the termination handler set_terminate(), which is defined on lines 6-9. Now we encounter our second printout:

term()

This is a straightforward cout on line 7. Then, we manually call main(). main is just a normal function, and even though it is special in that C++ will call it for you at startup, you are free to call it manually whenever you want. This is the third printout:

main()

main() then attempts to instantiate the local Positive object p(1). This calls the Positive constructor, which prints out

Positive::Positive(1)

This is a valid positive number, so no more exceptions are thrown. Control is returned to main(), which returns control to term(), which, being a termination handler, is not allowed to return, but still attempts to do so. I cannot find anything in the standard about what exactly is supposed to happen in this case, so I guess this is undefined behaviour. No matter what, returning doesn’t make any sense, as there is nowhere to return to. What happens on my system (gcc@linux) is program abortion with SIGABRT. What happens on yours?

Finally, the complete output of the program

$ ./outsidemain
Positive::Positive(-1)
term()
main()
Positive::Positive(1)
Aborted

This is probably not the most useful post I have written, but it highlights a few interesting points, and making a puzzle is always fun.

The returning function that never returned


Investigating a crash report yesterday, I came across this piece of code (simplified for the purpose of this post):

std::string foo() try {
    bar();
    return "foo";
} catch (...) {
    log("Unable to foo!")
}

int main() {
    std::string s = foo();
}

Today’s exercise: what happens if bar() throws? Obviously, the function doesn’t return, since the exception is thrown and control is passed to the catch block. But since this block does not return, and does not rethrow, foo() neither returns nor throws, so what goes into the string s? Answer: No one knows, we are left in the happy land called undefined behaviour.

The morale? If the original implementor had used -Wall, he would have gotten a warning that execution reaches the end of the non-void function. (Both Sun Studio and gcc happlily compiles without complaining otherwise.)

Update:I have posted a follow up going into more details on function try blocks.

If you enjoyed this post, you can subscribe to my blog, or follow me on Twitter