C++ Puzzle #1: Initialization Order

Time for another puzzle! What is the output of this program?


class Parent
{
public:
    Parent(int p=0.0) : p(p)
    {
        cout << "Parent(" << p << ")" << endl;
    }

    int p;
};

class Member
{
public:
    Member(int m=0.0) : m(m)
    {
        cout << "Member(" << m << ")" << endl;
    }

    Member& operator=(const Member& rhs)
    {
            cout << "Member is copied" << endl;
            m = rhs.m;
            return *this;
    }

    int m;
};

class Derived : public Parent
{
public:
    Derived() : foo(10), bar(foo*2)
    {
        Parent(7);
        m = Member(42);
    }

    int bar;
    int foo;
    Member m;
};


int main()
{
    Derived d;
    cout << d.p << " " << d.foo << " " << d.bar << " " << d.m.m << endl;
}

Answer: Undefined behaviour! Which means it could output whatever, crash at any point, or format your hard drive. Can you spot the source of the undefined behaviour? Hint: Look for the use of an uninitialized variable.

Did you see that the order of foo and bar in the initializer list is not in the same order as their declarations? Members are always initialized in the order of declaration, not in the order in the initialization list. This means bar is initialized before foo, using an uninitialized foo to compute its value. If you compile with all warnings enabled (always a good idea), your compiler should warn you about this. For instance, g++ tells me:

main.cpp: In constructor ‘Derived::Derived()’:
main.cpp:29:9: warning: ‘Derived::foo’ will be initialized after [-Wreorder]
main.cpp:28:9: warning:   ‘int Derived::bar’ [-Wreorder]
main.cpp:22:5: warning:   when initialized here [-Wreorder]

Next question then, what is the likely output of this program on a real compiler? On my Ubuntu 12.04 using g++ 4.6, I get:

   
Parent(0)
Member(0)
Parent(7)
Member(42)
Member is copied
0 10 8393920 42

Lets’ walk through what happens here.

1: Derived inherits from Parent. C++ guarantees that all parent objects are fully constructed before the constructor of the derived class is invoked, so the first thing that happens is that the default constructor of Parent is called. (Did I fool you with Parent(7); in the constructor though? That line will just create a local object that is never used. Had I moved it to the initializer list, it would have been used instead of the default constructor.)
2: Before the body of Derived‘s constructor, all its members will be initialized. Since we don’t specifically initialize Member m in the initializer list, it is first automatically default constructed, and then re-assigned in the body of the constructor (as seen in line 5 of the output).
3: On line 34 of the program, we create a local Parent object which is never used. Maybe we meant to set p to 7, but put the Parent() call in the wrong place?
4-5: Now another Member is constructed, and copy assigned to m. All this re-construction and copying could have been avoided if we had moved the call Member(42) to the initializer list.
6: Finally, we have a look at the resulting values of Derived‘s members:
- p is 0, not 7 as we maybe meant it to be, as explained in 1.
- foo is 10, hopefully as expected.
- bar is 8393920, due to the use of the uninitialized foo, as explained earlier. This is entirely by chance, and should not be relied on! Your program might just as well crash, or do something worse.

Finally let’s clean up the program and see what happens. We reorder the declarations of foo and bar, and move all initializations to the initializer list:

ERROR: Couldn't open file: [Errno 2] No such file or directory: '/home/anders/Documents/code/blog/initialization/main2.cpp'

Now the program is well defined, and free of repeated constructors and copying. Here is the output:

Parent(7)
Member(42)
7 10 20 42

Much better! Full source here.

As usual, the code for this blog post is available on GitHub.

If you enjoyed this post, you can subscribe to my blog, or follow me on Twitter.

Use boost list_of if You Can’t Have Uniform Initialization Yet

In which I demonstrate how boost::assign::list_of simplifies initialization of containers.

I have previously blogged about how Uniform Initialization Simplifies Testing. In C++, you can initialize an array when defining it, but you can not initialize containers:

	int a[] = {1, 2, 3}; //OK
	vector<int> v = {1, 2, 3}; //Not OK

You have to resort to something like this:

	//Either
	vector<int> v;
	v.push_back(1);
	v.push_back(2);
	v.push_back(3);

	//Or
	int tmp[] = {1, 2, 3}; 
	vector<int> v2(tmp, tmp+3);

In C++0X, we will have Uniform Initialization to take care of this, but if you are not on a supported compiler yet, you can use boost::assign while you are waiting:

	vector<int> v3 = boost::assign::list_of(1)(2)(3);

This is, again, especially useful in testing. To translate the example from my last post from C++0X to C++98 with boost:

using boost::assign::list_of;
 
int count_sheep(const vector<string>& animals) {
	return count(animals.begin(), animals.end(), "sheep");
}
 
TEST(TestCountSheep, returns_zero_when_there_are_no_sheep) {
	ASSERT_EQ(0, count_sheep(list_of("pig")("cow")("giraffe"))); //here
}
	 
TEST(TestCountSheep, returns_all_sheep) {
	ASSERT_EQ(2, count_sheep(list_of("sheep")("cow")("sheep"))); //and here
}

To use boost::assign, make sure to #include <boost/assign.hpp>. It has other clever tricks as well, so be sure to check out the docs. Boost is a widely used collection of high quality libraries for C++, and can be downloaded here.

If you enjoyed this post, you can subscribe to my blog, or follow me on Twitter.

What’s the Point of Function Try Blocks

This post is a follow up to The Returning Function that Never Returned, which I wrote a couple of months ago. You can read it first if you want, it is only around a hundred words, but this post can be read on its own as well.

In my previous post, I presented a function with try outside of the braces. This is called a “function try block”. The example went like this:

std::string foo() try {
    bar();
    return "foo";
} catch (...) {
    log("Unable to foo!")
}

This is completely useless, even dangerous. The try should be moved inside the function itself, like this:

std::string foo() {
    try {
        bar();
        return "foo";
    } catch (...) {
        log("Unable to foo!")
        //either rethrow or throw something else
    }
    //or make sure something is returned
}

So why were function try blocks introduced to the language in the first place? Have a look at this:

class A : public B{
    A(int x) : c(x){}
    C c;
}

How do you catch an exception thrown by Cs constructor? Having try inside the function block is too late. Initializing c inside the function block is not a solution either, since c will always be initialized before the body of the constructor, even if it is not mentioned in the initializer list. The only way to catch such an exception is to use a function try block on the constructor, and that is also why they were introduced. (Note that this argument is also valid if Bs constructor might throw.)

This is also the only sane case in which to use them. The other to candidates are regular functions and destructors, both of which I will get back to in a moment.

First, let’s have a look at what you can do with a function try block on a constructor. When the catch block reaches its end, it will rethrow the exception. It is impossible to swallow it. If you could swallow the exception, the code that tried to construct an object of this type would have no way of knowing that construction failed. A failed construction means the object doesn’t exist, and it doesn’t make sense to continue pretending nothing has happened. The only thing you can do is to throw an exception of another type, or cause a side effect (such as logging) and then retrhow. In particular, you cannot try to recover from the problem.

The same goes for destructors, you cannot swallow the exception. And since you really should avoid having destructors throw, using function try blocks on destructors is a bad idea.

Try blocks on regular functions behave a bit differently; if the end of the catch block is reached, the function will automatically return. But if you have a non-void function, this doesn’t make sense at all, as I mentioned in my previous post.

So in conclusion:

Only use function try blocks for constructors.
Don’t try to do anything else than rethrowing (possibly another type) or cause side effects like logging.

For a more in-depth discussion of this, have a look at Sutter’s Mill: Constructor Failures (or, The Objects That Never Were) by Herb Sutter. If you would like a recap of exception handling and constructor initializers thrown in, I recommend to start with Introduction to Function Try Blocks by Alan Nash.

Puzzle #0: Call Sequence

Here’s a puzzle that should highlight a couple of interesting features in C++:

What is the output of the following program?

#include <iostream>;
using namespace std;

int main();

void term() {
    cout << "term()" << endl;
    main();
}

struct Positive {
    Positive(int i) {
        set_terminate(term);
        cout << "Positive::Positive(" << i << ")" << endl;
        if (i <= 0)
            throw main;
    }
};

Positive n(-1);

int main() {
    cout << "main()" << endl;
    Positive p(1);
}

If you just want the answer, you will find it at the bottom of this post. But first, I will go through the why.

Lines 1-2 are uninteresting. Lines 4-18 contain all the magic, but I’ll come back to those, I want to go through the program in chronological order as it is being executed. The entry point for this program is not main(), but rather the definition of Positive n(-1). This variable is global, and so will be initialized before main() is called. To initialize it, its constructor is called with i = -1, and so the real action starts on line 13.

On line 13, I use set_terminate() to set a termination handler. The termination handler will be called if a thrown exception is not handled. Otherwise, this statement has no visible immediate effect. We then encounter the first printout of this program, on line 14:

Positive::Positive(-1)

Our Positive class is however designed to only handle positive numbers, and throws on line 16. It is set to throw main, but this is really just a decoy. In C++, you can throw whatever you want, I happen to throw a function pointer to main(). In particular, this does not mean that main() is called.

The exception thrown on line 16 is never handled, and so the builtin terminate() calls the termination handler set_terminate(), which is defined on lines 6-9. Now we encounter our second printout:

term()

This is a straightforward cout on line 7. Then, we manually call main(). main is just a normal function, and even though it is special in that C++ will call it for you at startup, you are free to call it manually whenever you want. This is the third printout:

main()

main() then attempts to instantiate the local Positive object p(1). This calls the Positive constructor, which prints out

Positive::Positive(1)

This is a valid positive number, so no more exceptions are thrown. Control is returned to main(), which returns control to term(), which, being a termination handler, is not allowed to return, but still attempts to do so. I cannot find anything in the standard about what exactly is supposed to happen in this case, so I guess this is undefined behaviour. No matter what, returning doesn’t make any sense, as there is nowhere to return to. What happens on my system (gcc@linux) is program abortion with SIGABRT. What happens on yours?

Finally, the complete output of the program


$ ./outsidemain
Positive::Positive(-1)
term()
main()
Positive::Positive(1)
Aborted

This is probably not the most useful post I have written, but it highlights a few interesting points, and making a puzzle is always fun.

Array Initialization Initializes all Elements

I was talking about C++ with a Java programmer the other day, as he had to work on a bit of C++ code. I discovered that it isn’t necessarily obvious that all the elements in an array are constructed when the array is initialized. That is however always the case.

For arrays of built-in types, the values aren’t actually set to a specific default (except for non-local and static arrays), and will end up having arbitrary values.

For arrays of a user-defined type, the default constructor is used for all the array elements. If you don’t want this behaviour, you need to use an array of pointers. All the elements will still be initialized, but the elements being initialized are now the pointers, not the actual objects. It is worth mentioning that pointers count as a built in type, and elements of non-local and static arrays will have undefined values. In particular, they are not initialized to point to NULL.

The reason why all the elements are initialized is that there is no such thing as a “null-object”. You can have a null-pointer not pointing to anything, but you cannot have an actual object that isn’t really an object, so to speak. This of course goes for Java as well, but here, an array of a user-defined type is actually an array of pointers, even though you don’t see any *s in the code. (Java uses pointer semantics for user-defined types and value semantics for built-in types.)

Here is a summary using actual C++ code:

int ints[10]; //Non-local array, all elements ==0
int func() {
    int ints[10]; //Local array, no default value is set, but all elements are fully usable ints.
    static int sints[10]; //Static local array, all elements are initialized to 0
    Foo foos[10]; //User-defined type, all elements are constructed using the default constructor Foo::Foo();
    Foo *foos[10]; //Local array, no default value is set. The pointers point all over the place.
}

If your class has no public default constructor, it is impossible to create an array of such objects:

class NoDefault {
private:
    NoDefault(); //Make default constructor private, and hence unusable
};
NoDefault nodefaults[10]; //Impossible!
NoDefault *pnodefaults[10]; //This is fine, no objects are constructed, only pointers.

Note also that you cannot have an array of references. The most obvious reason is that a reference must always refer to an object. There is no such thing as a “null-reference”. One could however imagine getting around this using an initializer list, like this:

    Foo f1, f2;
    Foo& foors[2] = {f1, f2};

Now we don’t try to initialize references that don’t refer to anything. This is still not allowed though. One reason is that you cannot point to a reference, so accessing elements in the array in a normal sense like foors[1] wouldn’t make sense. In C++, you can only have arrays of objects, and references are not objects. (Pointers are objects though.)

In summary:

Elements of arrays of a user-defined type are initialized using the default constructor.
Elements of local non-static arrays of a built-in type are not explicitly initialized, and will have arbitrary values.
Elements of non-local and static arrays of a built-in type are initialized to their default value (0).
You cannot have an array of a user-defined type without a public default constructor.
You can only have arrays of objects and pointers, not references.

Be Careful with Static Variables

In a project I am working on, we are using Google Test to write and run our unit tests. Google Test is a really nice unit test framework, but that is another story. The thing you need to know for this post, is that it has a flag --gtest_repeat, which will make it run the test several times.

The other day, we ran into a strange problem. The following would work:
$ ./Test; ./Test
whereas the following would break:
$ ./Test --gtest_repeat=2

The code that was being exercised involves quite a bit of socket programming, so at first we wasted some time looking for improper shutdowns and reuse of sockets, but that was a dead end. We also confirmed that our SetUp() and TearDown() was executed properly for each time the test ran. Then we discovered that running the test manually two times in a row worked, and ended up a bit puzzled.

If you read the title, you might have figured out our problem already, but here it is:

Non-local static variables are always initialized before main() is executed. We used a static variable to keep track of the state in a dummy object for our test, and relied on the runtime system to initialize it for us. When we ran the test twice manually, the binary was actually executed twice, which resulted in the variable also being initialized twice. But when passing --gtest_repeat to Google Tests main(), the binary was only executed once, resulting in the variable only being initialized once, again resulting in keeping state from one pass to the next. Initializing the variable manually in a SetUp() function fixed the problem.

Appendix A: Why non-local static variables are initialized before `main()`

Imagine you have a global static variable defined in one translation unit (a translation unit is basically one .cc-file with headers included), but declared and used in several others. There is no way for the compiler to know where it will be used first, without having access to all translation units, which only the linker has. The best C++ can do is to have the runtime system initialize all such variables before main() is executed. Also note that due to the same argument, there is no guaranteed order of initialization of non-local static variables.

But why can’t the variable just be automatically wrapped with code that initializes it the first time it is accessed? Think about it for a moment. Where would that code need to be placed? In all the places that accesses that variable! And again, this might be in many separate translation units.

When variables are initialized in C++ is an interesting topic in itself, and I guess I will be coming back to that in a later post.

If you enjoyed this post, you can subscribe to my blog, or follow me on Twitter.

	Anders Schau Knatten on Microsoft C++ versions ex…
	Anonymous on A prvalue is not a tempor…
	A prvalue is not a t… on A prvalue is not a tempor…
	A prvalue is not a t… on lvalues, rvalues, glvalues, pr…
	A prvalue is not a t… on A prvalue is not a tempor…

In summary:

Appendix A: Why non-local static variables are initialized before main()

Appendix A: Why non-local static variables are initialized before `main()`