How to avoid includes in headers

I have now written twice about why you should minimize the use of include in header files. As one reader on Reddit politely put it (“crap article”), it is about time I write a post about how to do that.

There is mainly two things you can do:

Reduce the number of headers you include.
Reduce the contents of those headers.

1: Reducing the number of `include`s

Why do you need includes in the header in the first place? When you compile a file that includes header A, all the names that are used in header A need to be defined. So if header A uses an name defined in header B, header A should include header B (or else anyone who includes A will have to include B as well).

What do I mean by “using” a name then? Basically everything that mentions the name, except as a pointer or reference. Some examples of using a name:

SomeClass f; //Creating an object
some_function(); //Calling a function
class Derived : public SomeClass; //Inheriting from a class

Some examples of mentioning a name but not using it (only needs access to a declaration, not the definition):

SomeClass* f1; //Declaring a variable of pointer type
SomeClass& f2; //Declaring a variable of reference type

As soon as you want to use those though, they must be completely defined:

f1->function(); //Needs definition
f2->member; //Needs definition

Also note that the new type of enum, (enum class) only needs to be declared to be used, as long as you don’t actually mention its value:

SomeEnum e; //Declaring a variable of oldschool enum type needs definition.
SomeEnumClass e; //Declaring a variable of newschool enum type does not need definition...
SomeEnumClass e = SomeEnumClass::VALUE; //...but using a value does.

So given this knowledge, how do we reduce the number of includes in headers?

The first thing to do is to make sure you only include what is necessary to get the header file to compile. If you have a class definition in class.h and the implementation in class.cpp, the latter will typically need includes that are not needed by the header, and so should be put in the cpp. A simple way to check that you don’t have any unnecessary includes is to create an empty cpp file that only includes class.h. If it compiles, all the necessary headers are there. Then you can try commenting out includes in class.h and see which ones actually need to be there. Move the rest down to class.cpp.

The next thing to do is to check whether you actually need access to the definitions, or if a declaration will do. Try commenting out one #include at a time, and checking which names the compiler complains about. Then see if you are actually using that name, or if you are merely holding a pointer or a reference to an object of the type. If so, move the include down to the .cpp file, and add a forward declaration. Here’s an example:

#include "stuff.h"

class Ohlson
{
public:
    void doThings(Stuff* stuff);
};

The include stuff.h in not needed, as we never use stuff. Instead, we can use a forward declaration:

class Stuff; //Forward declaration instead of include

class Ohlson
{
public:
    void doThings(Stuff* stuff);
};

In the definition of Ohlson::doThings(Stuff* stuff) however (typically in ohlson.cpp), we probably need access to the definition of Stuff and need to include it.

So one of the things that will reduce the number of includes in headers is using pointers/references instead of values. There are other considerations to design as well though, so I would not advocate moving from values to pointers without considering the full picture.

Another thing that helps is to have the definition of functions in the .cpp file instead of the .h file. This however prevents inlining, so again, use caution.

A final technique that could help is the Pimpl Idiom. Briefly explained, instead of keeping your private members in the header, you keep them in a struct in the .cpp file. This means you don’t need access to their definitions in the header file, and can move their respective includes down to the .cpp file as well.

Reducing the content of includes

The other thing you can do to reduce the negative effects of includes in headers, if you cannot avoid them, is to reduce the content of those included headers. Here are some techniques:

C++ supports several paradigms, but a good recommendation for most is to depend on abstractions instead of implementations. In object oriented programming, this is done by programming to interfaces. C++ doesn’t have an explicit interface concept, but a class with only pure virtual methods is the same thing. Interfaces should also be kept small and focused. This means an interface should be very quick to compile, reducing the pain of having it included in many files. You will however still need to recompile all the files that include it when it changes, but the more focused it is, the fewer files will depend on it.

A generalization of the above is to avoid large inheritance hierarchies, even if you aren’t able to only use pure interfaces.

If you absolutely need to include a file in your header, but it is one that you own, you can reduce the content of that file by applying the techniques mentioned in the first part of this article to that file. (This is just section 1 seen from the other side of the table.)

A less straightforward technique is to be a bit clever with your templates. Templates are usually completely defined in header files, meaning that a change in the implementation triggers a recompile of everything that uses that template. There is however ways to move the implementation of templates out of header files, for instance as in this Dr. Dobbs article. C++11’s extern template can also help reduce compilation time. Avoiding the use of templates completely of course also solves the problem, but there was probably a reason why you wanted them in the first place.

One final technique that cannot go unmentioned is precompiled headers. While I don’t like what they do to dependencies, I consider them a necessary evil in medium to large projects to keep compile-times reasonable. Precompiled headers are very well explained in the first link in this paragraph, but here’s a short version: Headers can take a long time to compile. If they also change rarely, it would be nice to be able to cache them, and this it what precompiled headers does. In Visual Studio for instance, you put these includes in a file conventionally called stdafx.h, and then include stdafx.h in all your cpp files instead of the actual headers you need. The compiler then only has to compile them once. Headers from the standard library are typically placed here, and third party headers are also usually a good fit.

Those are the techniques I could think of, but I am sure I missed some. Which are you favourite ones? Which did I miss? Which of them are silly? Please use the comment section below.

If you enjoyed this post, you can subscribe to my blog, or follow me on Twitter.

Another Reason to Avoid #includes in Headers

I have already argued that you shouldn’t put all your includes in your .h files. Here is one more reason, compilation time.

Have a look at this example, where the arrows mean “includes”

file5.h apparently need access to something which is defined in file4.h, which again needs access to three other headers. Because of this, all other files that include file5.h also includes file[1-4].h.

In c++, #include "file.h" really means “copy the entire contents of file.h here before compiling”. So in this example, file[1-3] is copied into file4.h, which is then copied into file5.h, which again is copied into the three cpp files. Every file takes a bit of time to compile, and now each cpp file doesn’t only need to compile its own content, but also all of the (directly and indirectly) included headers. What happens if we add some compilation times to our diagram? The compilation time of the file itself is on the left, the compilation time including the included headers are on the right.

As we can see, this has a dramatic effect on compilation time. file6.cpp and file7.cpp just needed something from file5.h, but got an entire tree of headers which added 1.2 seconds to their compilation times. This might not sound much, but those numbers add up when the number of files is large. Also, if you’re trying to do TDD, every second counts. And in some cases, compilation times of individual headers can be a lot worse than in this example.

What if file5 didn’t really need to have #include "file4.h" in the header, but could move it to the source file instead? Then we would have this diagram:

The compilation time of file[6-7].cpp is significantly reduced.

Now let’s look at what happens if a header file is modified. Let’s say you need to make a minor update in file1.h. When that file is changed, all the files that include it need to be recompiled as well:

But if we were able to #include "file4.h" in file5.cpp instead of file5.h, only one cpp file would need to recompile:

Recompilation time: 1.7 seconds vs. 4.5 seconds. And in a large project, this would be a lot worse. So please try to move your #includes down into the cpp files whenever you can!

The Graphviz code and makefile for this blog post is available on GitHub.

If you enjoyed this post, you can subscribe to my blog, or follow me on Twitter.

Efficient Pure Functional Programming in C++ Using Move Semantics

In which I briefly mention what pure functional programming is, explain why this can be slow in C++, and use move semantics to solve that problem. Be warned that this post is a bit longer than usual for this blog, and that it assumes more knowledge of C++11 than my posts usually do.

Pure functional programming is programming without state. In short, functions should not modify their input, not modify local or global state, but return new values instead. Often, we also want to avoid variables completely.

For instance, instead of doing:

void square(vector<int>& v) { /*multiply each element with itself*/ }

void imperative()
{
    vector<int> v = getNumbers();
    square(v);
    doStuffWith(v);
}

we do

vector<int> squared(const vector<int>& v) { /*return a new vector with squared elements*/ }

void functional()
{
    doStuffWith(
        squared(
            getNumbers()));
}

As you might already have noticed, this style results in more copies than the imperative style, which will lead to less efficient code. Depending on the problem you are solving, this might or might not matter, but one of the few real arguments for using C++ is that you have a problem that needs an efficient solution.

Let’s take a look at an example program, translate it from imperative to functional style, and finally use move semantics (explained later) to eliminate the performance hit. For our example, we will compute the root mean square of a function. Wikipedia has a good definition, but in short: Given a list of numbers, square each of them, take the mean of those squares, and return the square root of that number.

First, let’s define some helper functions:

double sum(const std::vector<double>& v)
{
    return std::accumulate(v.begin(), v.end(), 0, std::plus<double>());
}

double mean(const std::vector<double>& v)
{
    return sum(v) / v.size();
}

The square root function already exists as std::sqrt, so we don’t need to define that. The only missing piece is a function computing the square of a vector of numbers. In good old C++, there are two ways of doing this:

vector<int> square(vector<int> v); //(Or the argument could be a const reference)

void square(vector<int>& v);

The first version creates a copy of the vector, containing the squared number. This is perfect for functional code, but makes a copy that might be unnecessary. The second version modifies the argument, which is faster and often seen in imperative code. Since we assume we want efficiency, we’ll go with the second one in the examples that follow.

Here is the full, traditional, stateful solution, including square and a usage example:

    void square(std::vector<double>& v)
    {
        std::transform(v.begin(), v.end(), v.begin(), [](double d) { return d*d; });
    }

    double rms(std::vector<double>& v)
    {
        square(v);
        double the_mean = mean(v);
        double the_rms = std::sqrt(the_mean);
        return the_rms;
    }

    int main()
    {
        std::vector<double> v = {1,2,3};
        double the_rms = rms(v);
        std::cout << the_rms << std::endl;
        return 0;
    }

I won’t go into the full discussion of why functional programming is nice, but with no state there is a whole class of problems we never get into. How often have you not found a bug that was due to some code you didn’t know about mutating some seemingly unrelated state? Also, functions are a lot easier to both test and reason about when the only thing they care about is their input and output.

Let’s have a look at a pure functional alternative:

    std::vector<double> squared(std::vector<double> v)
    {
        std::transform(v.begin(), v.end(), v.begin(), [](double d) { return d*d; });
        return std::move(v);
    }

    double rms(const std::vector<double>& v)
    {
        return std::sqrt(mean(squared(v)));
    }

    int main()
    {
        std::cout << rms({1,2,3}) << std::endl;
        return 0;
    }

As you can see, no state, and no variables except for the function parameters, which we cannot really avoid. The implementation of rms even reads exactly like its mathematical definition, root(mean(squared(numbers))). There is however the issue of the copy being made when squared() is called. (No copy should be made when squared() returns though, due to the return value optimization).

Enter C++11’s move semantics. If you don’t know what move semantics and rvalue references are, the simplest explanation I have come across is this StackOverflow answer. You really should read it through, but if you’re short on time, here is the even quicker intro: If a squared(std::vector v) somehow knew that whoever called that function would never need the original v again, it could just “steal” it instead of making a copy. In the example above, {1,2,3} is a temporary object that can never be referred to again, so instead of copying it into squared, we could just have the internal storage of v point to the same place in memory. And this is exactly what happens with rvalue references.

Rvalue references are denoted by a double ampersand: squared(std::vector&& v), and can only bind to temporaries. Here is the pure functional example again, this time using rvalue references. This version completely avoids copying the vector:

    std::vector<double> squared(std::vector<double>&& v)
    {
        std::transform(v.begin(), v.end(), v.begin(), [](double d) { return d*d; });
        return std::move(v);
    }

    double rms(std::vector<double>&& v)
    {
        return std::sqrt(mean(squared(std::move(v))));
    }

    int main()
    {
        std::cout << rms({1,2,3}) << std::endl;
        return 0;
    }

{1,2,3} is a temporary, and so rms(std::vector&& v) is free to steal its resources without a deep copy. Inside of rms however, v itself is not a temporary. It is a named variable than can be referred to, and thus it would not be safe for squared(std::vector&& v) to steal its resources. We however know that we will never use it again after calling squared(v), so we tell the compiler to go ahead and steal anyway by wrapping it in std::move(). The same is done for the return value in squared().

In the functional examples, the vector that we wanted to compute the root mean square of was a temporary, thus allowing move semantics. But what if we wanted to call rms() from a non-pure part of the code base? Here’s what would happen if the original vector was a normal variable:

    std::vector<double> v = {1,2,3};
    //rms(v); //invalid initialization of reference of type ‘std::vector<double>&&’ from expression of type ‘std::vector<int>’

As you can see, we can no longer use the functional rms(). If we however know that we will not be using the original v again, we could wrap it in std::move() as we did earlier:

    rms(std::move(v));

Finally, if that is not an option, we could manually make a copy:

    rms(std::vector<double>(v));

In a pure functional program, our functional version of rms would be both fast and fitting the style we’re in. In an imperative program, it would stylistically be a worse fit, but we still wouldn’t loose any performance.

Finally, a few notes:

I have not considered to what degree copy elision would achieve the same benefits as move semantics for this example.
I decided not to complicate the examples using templates and universal references but that would be an interesting extension.
If you haven’t seen move semantics before; It is a general technique that has nothing to do with functional programming. I just think they make a great team.
There is a whole lot more to rvalues and move semantics than could be presented here, it is seen as one of the defining features of C++11.

As usual, the code for this blog post is available on GitHub.

If you enjoyed this post, you can subscribe to my blog, or follow me on Twitter.

A Bad(?) Excuse for Private Inheritance

A few weeks ago I introudced private inheritance, and finished with a comment about a common excuse for using it. In this post I give an example of such usage, and discuss whether it is a good idea or not.

Say you are going to write a new class CppInfo, which will print out some information about the C++ language. Here is an example:

class CppInfo
{
public:
    string info()
    {
        return "Language: C++\nCreator : Bjarne Stroustrup\nQuality : ??";
    }
};

Now we need some way to assess the quality of C++, to replace those question marks. Luckily, there exists a class that assesses the quality of a programming language:

class LanguageQuality
{
public:
    virtual ~LanguageQuality() {}

    double quality() const
    {
        return 100.0 / charsForHelloWorld();
    }
protected:
    virtual int charsForHelloWorld() const = 0;
};

As you can see, the only thing we need to do is to create a derived class from LanguageQuality for C++, returning the number of characters required for a C++ Hello, World program.

This is where private inheritance can come in handy. We need access to LanguageQuality::quality(), and we need to override charsForHelloWorld(). If we chose to inherit privately from it in CppQuality, we get both of these, yet none of the members of LanguageQuality will be accessible by the users of that class (as shown in the last post).

Here is a version of CppInfo that inherits privately from LanguageQuality:

class CppInfo_UsingInheritance : private LanguageQuality
{
public:
    string info()
    {
        return "Language: C++\nCreator : Bjarne Stroustrup\nQuality : " + to_string(quality());
    }
private:
    int charsForHelloWorld() const
    {
        return string("#include <iostream>\nint main() { std::cout << "Hello World"; }").size();
    }
};

I have to admit, this looks pretty neat. The alternative is to create a new class whose only responsibilty is overriding charsForHelloWorld():

class CppQuality : public LanguageQuality
{
protected:
    int charsForHelloWorld() const 
    {
        return string("#include <iostream>\nint main() { std::cout << \"Hello World\"; }").size();
    }
};

And then have this as a member in CppInfo:

class CppInfo_UsingComposition 
{
public:
    string info()
    {
        return "Language: C++\nCreator : Bjarne Stroustrup\nQuality : " + to_string(cppQuality.quality());
    }
private:
    CppQuality cppQuality;
};

Now you have two classes instead of one, and 18 lines of code instead of 13. Which solution is best? I am not sure, so let’s have a look at the arguments:

The solution using inheritance is smaller. All the code is in one place, so it is easy to get an overview. We have however coupled our class CppInfo very tightly to LanguageQuality, even though their responsibilities aren’t directly related. Considering the single responsibility principle, CppInfo is about printing general info about C++, whereas LanguageQuality is about computing the quality of a programming language.

I think the principle that provides the tipping point for me in this example, is that every piece of your software should have a single reason to change. Say we want to change how we compute the quality of a language (the number of characters required for Hello, World just doesn’t cut it anymore). This would require a change to LanguageQuality. We would then probably need to update all its derived classes. And to change a class, you really should understand all its responsibilities, invariants and effects. All the other code in CppInfo has nothing to do with the change we are making, so we shouldn’t have to worry about it. Also, if we want to change what information we print out about C++ (say we want to add the publication year of the latest language standard), we want to understand how info is printed, not how the quality is computed.

Again, in a small example like this, it might not matter much, but in a larger example it will. And what starts out small has a nasty tendency to grow, so you’d better care about design from the very start.

Just for fun, here is the result after adding JavaInfo and PythonInfo:

Language: C++
Creator : Bjarne Stroustrup
Quality : 1.587302
Language: Java
Creator : James Gosling
Quality : 0.970874
Language: Python
Creator : Guido van Rossum
Quality : 5.263158

Unsurprisingly, we beat Java, but lose to Python.

Now I am interested to hear your thoughts, which solution would you choose, and why?

As usual, the code for this blog post is available on GitHub.

If you enjoyed this post, you can subscribe to my blog, or follow me on Twitter.

Notes From My GoogleTest Demo

I recently gave a demo of GoogleTest at Kjeller Software Community / Oslo C++ Users Group. These are the notes from my demo. My apologies to those who did not attend, these might not make much sense unless you saw the demo.

Installation

See my post about installation and setup of GoogleTest in your project.

Test types

TEST(TestCaseName, TestName)
A single test TestName belonging to the TestCaseName test case. The TEST macro converts this into class TestCaseName_TestName

TEST_F(TestFixtureName, TestName)
A single test TestName belonging to the TestFixtureName test case, which uses a fixture. The TEST_F macro converts this into class TestFixtureName_TestName : public TestFixtureName. You need to implement the TestFixtureName : public ::testing::Test class also.

TEST_P(TestFixtureName, TestName)
A single parametrized test TestName belonging to the TestFixtureName test case, which uses a fixture with a parametrized interface. You need to implement the TestFixtureName : public ::testing::WithParamInterface class also. Use GetParam() to get the parameter, and INSTANTIATE_TEST_CASE_P to instantiate tests.

Finally remember that you can use SetUp() and TearDown(), but might just as well use the plain constructor and destructor. A reason to use TearDown() instead of the destructor is if it might throw an exception (which destructors should not do).

Assertions

Here are some examples of assertions:
ASSERT_TRUE
ASSERT_FALSE
ASSERT_EQ
ASSERT_DOUBLE_EQ
ASSERT_LT
ASSERT_THROW

You can also write your own assertion functions:
AssertionResult predicateFunction(...)

Or just write a void function that does all the asserts internally:
void assertWhatever()

Some final tips

Use one test project per production project, for instance NoFlyListTest for NoFlyList. This makes it easy to find the tests for a project. If the rest of your code is decoupled, it might also speed up linking a lot, since you only need to link the test project, and not your entire solution. This especially helps if you are doing TDD, in which you will typically compile and link several times a minute.

Write short tests, and only test one thing per test. This will make it easy to figure out what went wrong when tests break in the future.

Write tests before you implement the code that passes them. This will verify that the test didn’t pass just by accident. It will also let you see how the failing test looks like, so you can make sure it is self-explanatory enough that someone will understand what went wrong when it breaks.

Don’t be a terrorist, you’ll end up in a bad place.

Source Code

If you want to have a look at the code from my Kjeller Software Community demo, it is available on GitHub.

If you enjoyed this post, you can subscribe to my blog, or follow me on Twitter.

Installing and Using GoogleTest with Visual Studio

Important note: This blog post is one of the top hits on Google for “googletest visual studio”. It is however quite old, and might no longer reflect the best way to use GoogleTest with Visual Studio.

Google C++ Testing Framework (aka. GoogleTest) is a unittesting framework for C++. This post describes how to install it, and set it up in your project. I am using GoogleTest 1.6.0 here, but other versions should be similar. The instructions provided are for Visual Studio 2010, but 2012 should be exactly the same.

Installation

First of all, download the latest version from the GoogleTest download page, and unzip it. Inside, you will find a directory called msvc, which contains the Visual Studio solutions:

In this directory, you will find two solutions, gtest.sln, and gtest-md.sln. Which one you want depends on whether you are using a static or dynamic runtime. If you are unsure which one to use, take a look in your existing solution:

If you are using the DLL version of the runtime, use the gtest-md.sln solution, otherwise use gtest.sln. Before you open the solution though, make sure it is not read only, as Visual Studio will want to convert it to your version:

Open the solution you want, agree to convert the solution. Make sure you build it both in Debug and Release versions. The resulting libraries end up in gtest-1.6.0\msvc\gtest\Debug and gtest-1.6.0\msvc\gtest\Release, respectively. This is a good time to copy the libraries to wherever you keep libraries for your projects. The files you will need are gtestd.lib and gtest_maind.lib from the Debug directory, and gtest.lib and gtest_main.lib from the Release directory. In addition, you need all the headers from gtest-1.6.0\include. (Of course, you could just copy the entire gtest-1.6.0 directory and not care about which files you need.)

Setup for Your Project

I suggest to use one test project per production project. This makes it easy to find the tests you are looking for. Also, if your code is nicely decoupled, you might be able to link just these two projects, and not your entire solution. This can speed up your “red-green-refactor” cycle considerably. Finally, this makes it easy to exclude your test code from the final binary you ship. Here is an example from my Kjeller Software Community presentation:

Set the following properties for your test project:

Make sure to set up the test project to use the same runtime library as your production project (MT / MTd / MD / MDd).
Add an additional include directory c:\wherever\you\put\gtest\include
Add an additional library directory c:\wherever\you\put\the\libs
Under Linker -> Input , add a dependency on gtest.lib for your Release configuration, and gtestd.lib for your Debug configuration. Unless you want to write your own main function that runs all the tests, also add a dependency on gtest_main.lib / gtest_maind.lib, respectively. This will add a main() method to your project which discovers and runs all the tests.
Under Properties -> Linker -> System, set SubSystem to Console, to keep the test-window open after the tests have run.

Also make sure that your test project depends on the production project:

And that’s it! Now you can start writing and running tests, but since the documentation already describes that pretty well, I will not go into that here. If you want to have a look at the code from my Kjeller Software Community demo, it is available on GitHub.

If you enjoyed this post, you can subscribe to my blog, or follow me on Twitter.

C++ Puzzle #1: Initialization Order

Time for another puzzle! What is the output of this program?


class Parent
{
public:
    Parent(int p=0.0) : p(p)
    {
        cout << "Parent(" << p << ")" << endl;
    }

    int p;
};

class Member
{
public:
    Member(int m=0.0) : m(m)
    {
        cout << "Member(" << m << ")" << endl;
    }

    Member& operator=(const Member& rhs)
    {
            cout << "Member is copied" << endl;
            m = rhs.m;
            return *this;
    }

    int m;
};

class Derived : public Parent
{
public:
    Derived() : foo(10), bar(foo*2)
    {
        Parent(7);
        m = Member(42);
    }

    int bar;
    int foo;
    Member m;
};


int main()
{
    Derived d;
    cout << d.p << " " << d.foo << " " << d.bar << " " << d.m.m << endl;
}

Answer: Undefined behaviour! Which means it could output whatever, crash at any point, or format your hard drive. Can you spot the source of the undefined behaviour? Hint: Look for the use of an uninitialized variable.

Did you see that the order of foo and bar in the initializer list is not in the same order as their declarations? Members are always initialized in the order of declaration, not in the order in the initialization list. This means bar is initialized before foo, using an uninitialized foo to compute its value. If you compile with all warnings enabled (always a good idea), your compiler should warn you about this. For instance, g++ tells me:

main.cpp: In constructor ‘Derived::Derived()’:
main.cpp:29:9: warning: ‘Derived::foo’ will be initialized after [-Wreorder]
main.cpp:28:9: warning:   ‘int Derived::bar’ [-Wreorder]
main.cpp:22:5: warning:   when initialized here [-Wreorder]

Next question then, what is the likely output of this program on a real compiler? On my Ubuntu 12.04 using g++ 4.6, I get:

   
Parent(0)
Member(0)
Parent(7)
Member(42)
Member is copied
0 10 8393920 42

Lets’ walk through what happens here.

1: Derived inherits from Parent. C++ guarantees that all parent objects are fully constructed before the constructor of the derived class is invoked, so the first thing that happens is that the default constructor of Parent is called. (Did I fool you with Parent(7); in the constructor though? That line will just create a local object that is never used. Had I moved it to the initializer list, it would have been used instead of the default constructor.)
2: Before the body of Derived‘s constructor, all its members will be initialized. Since we don’t specifically initialize Member m in the initializer list, it is first automatically default constructed, and then re-assigned in the body of the constructor (as seen in line 5 of the output).
3: On line 34 of the program, we create a local Parent object which is never used. Maybe we meant to set p to 7, but put the Parent() call in the wrong place?
4-5: Now another Member is constructed, and copy assigned to m. All this re-construction and copying could have been avoided if we had moved the call Member(42) to the initializer list.
6: Finally, we have a look at the resulting values of Derived‘s members:
- p is 0, not 7 as we maybe meant it to be, as explained in 1.
- foo is 10, hopefully as expected.
- bar is 8393920, due to the use of the uninitialized foo, as explained earlier. This is entirely by chance, and should not be relied on! Your program might just as well crash, or do something worse.

Finally let’s clean up the program and see what happens. We reorder the declarations of foo and bar, and move all initializations to the initializer list:

ERROR: Couldn't open file: [Errno 2] No such file or directory: '/home/anders/Documents/code/blog/initialization/main2.cpp'

Now the program is well defined, and free of repeated constructors and copying. Here is the output:

Parent(7)
Member(42)
7 10 20 42

Much better! Full source here.

As usual, the code for this blog post is available on GitHub.

If you enjoyed this post, you can subscribe to my blog, or follow me on Twitter.

Why Return Codes Make Me Mad

In which I go on and on complaining about return codes, letting off some steam.

So, return codes. They make me mad. When given the choice, I always prefer exceptions.

Let’s say you have a function getValue(), which returns, well, the value of something. However, if something went wrong during setup, the value might never have been set, and we need to detect that. What we would like to write is

type value = getValue(); //For some type bool / int / WhateverClass

We do however need to detect error conditions. Either we use exceptions, or we use return codes. We can do either

try {
    type value = getValue();
} catch (NotInitializedException&) {
    //handle
}

type value;
bool ok = getValue(value)
if (!ok) {
    //handle
}

Either way you need to look out for errors, and maybe one isn’t so much worse than the other. What return codes do to the flow of you code however, is much worse. Here are some of the things that annoy me the most:

The size of the code explodes
An innocent, readable line like

if (getValue() == whatever)

turns into this mess:

type value;
getValue(value)
if(value == whatever)

Which one is easiest to read?

And of course, you can’t do this anymore:

finalResult = doStuffWith(getValue());

Variables have intermediate, invalid values
In the previous example, value needs to be constructed before getValue() is called. There’s a whole list of problems with this:

If it is a fundamental type (int, float, bool etc.) you have to manually initialize it, or you will have a dangerous uninitialized variable.
Some types don’t even have a default constructor
Some types are expensive to construct
Some constructors have side effects
value will have a meaningless value for a (short) part of the life of the program

You cannot use const variables.
I always make variables const whenever I can. This helps with program correctness, because I will never accidentally modify them, or pass them to a function that modifies them. But now you can’t do

const type value = getValue();

you have to do

type value; //Can't use const
getValue(value) //because it's modified here

Error checking code will infect your code and make it unreadable

Instead of handling errors in one place like this:

try {
    const type value = getValue()
    const othertype othervalue = getRelatedValue(value);
    doStuff(othervalue);
} catch (NotInitializedException&) {
    //handle;
}

You have to check for errors all the time:

type value;
bool ok = getValue(value);
if (ok) {
    othertype othervalue;
    bool otherok = getRelatedValue(othervalue, value);
    if (otherok) {
        doStuff(othervalue);
    } else {
        //handle
    }
} else {
    //handle
}

Look at this mess! Notice how the error handling code gets intermixed with the happy code. In the version of the code using exceptions, this is not necessary, as the rest of the try block is automatically skipped when an exception is thrown.

Constructors can’t even return a return code
The return value of a constructor always is the constructed object, so here you don’t even have a choice. If you don’t want to use exceptions, you need to remember that the object is now in an invalid state. So either you hope your users remember to do

SillyType s;
if (!s.gotProperlyConstructed() {
    //handle
}

or you need to make the newly constructed object remember that it is invalid, and return that as an error code indicating that every time someone calls a method on it from now on.

The code does not say what it really means
An argument to a function is an argument to a function. The return value of a function is the return value of a function. When reading a piece of code, this is the natural way to reason about it. When arguments are suddenly used for return values, the natural flow of the code is broken.

That’s not all
In addition, there’s a host of other problems like:

It’s easy to forget to check a return code, leaving your program silently in an invalid state. However, if you forget to catch an exception, you program blows up and at least you know something is wrong.
Operators can’t use return codes.
Exceptions tend to have useful information attached to them, describing the error. Return codes are usually enums, integers or booleans. (This can be fixed though, by returning an error object instead of an error code.)

Readable code
In all of the examples above, return codes lead to code that is less readable. And for me, readable code is extremely important. When your code is unreadable:

It is easier for bugs to hide
It is harder to make changes
It takes longer to get to know the code for new programmers (which might be yourself, half a year later)
It is usually harder to write tests
It is scary to refactor because you don’t understand the code fully, so you end up patching on another few lines and another few lines instead

In short, unreadable code leads to more bugs, longer development time for new features, more time spent in the debugger, and more time spent on maintenance. This is why I hate unreadable code. This is why return codes make me mad.

What can you do?
For new projects, go with exceptions. For legacy code you usually need to stay with the existing style. If all the code around you uses return codes, you probably should too. I would however argue that there still can be a place for exceptions: If you are extending the code base with an isolated module, or doing a major refactoring of one, and the interface to the rest of the code is small, use exceptions internally. Then catch the exceptions on the boundary and translate to return codes. After a while, you will gradually move towards exceptions and more readable code for the entire code base.

If you enjoyed this post, you can subscribe to my blog, or follow me on Twitter.

Avoiding Heap Allocations With Static Thread Locals

In which I show how to use static thread locals to reuse function locals between invocations, and why this can be especially important when allocating on the heap.

Let’s say you are writing parallel software to process something. It has a single Processor object, with a member function process(). This function gets called from several threads. To complete its task, it depends on a Helper object (ignore the output for now).

class Processor
{
public:
    void process(char data, int iteration)
    {
        cout << "Iteration " << iteration << " ";
        Helper h(data);
        //Use the helper object
        cout << endl;
    }
}

class Helper 
{
public:
    Helper(char id)
    {
        cout << "Helper created for thread " << id;
    }
};

The data parameter stands in for some data structure that is unique to each thread, so we need a unique Helper for each thread. We assume that the Helper object never has to change inside a thread.

In this first example, the Helper object is a local variable, but this means a new instance gets created and destroyed for each invocation. If process() is called a lot, we might want to avoid this. Here is how to solve that using static thread locals:

    void process_threadlocal(char data, int iteration)
    {
        cout << "Iteration " << iteration << " ";
        static thread_local Helper* h = nullptr; 
        if (h == nullptr)
            h = new Helper(data);
        //Use the helper object
        cout << endl;
    }

(Note that the null check is there because I am using GCC which doesn’t support proper C++11 thread_local yet. In the future (or on some other compilers) you should be able to just type static thread_local Helper h* = new Helper(data);, or even better: static thread_local Helper h(data);)

Let’s have a look at the client code. The following main simulates three threads (named A, B and C) running 3 iterations of process each, first using the regular process(), then using process_threadlocal():

int main() 
{
    Processor processor;

    cout << endl << "LOCAL" << endl;
    for (char thread_id = 'A'; thread_id <= 'C'; ++thread_id) {
        thread([&]() {
            for (size_t i = 0; i < 3; ++i) {
                processor.process(thread_id, i);
            }
        }).join();
    }

    cout << endl << "STATIC THREAD LOCAL:" << endl;
    for (char thread_id = 'A'; thread_id <= 'C'; ++thread_id) {
        thread([&]() {
            for (size_t i = 0; i < 3; ++i) {
                processor.process_threadlocal(thread_id, i);
            }
        }).join();
    }

}

(For simplicity, we wait for each thread to finish before starting the next.)

Here is the output:

LOCAL:
Iteration 0 Helper created for thread A
Iteration 1 Helper created for thread A
Iteration 2 Helper created for thread A
Iteration 3 Helper created for thread A
Iteration 4 Helper created for thread A
Iteration 0 Helper created for thread B
Iteration 1 Helper created for thread B
Iteration 2 Helper created for thread B
Iteration 3 Helper created for thread B
Iteration 4 Helper created for thread B
Iteration 0 Helper created for thread C
Iteration 1 Helper created for thread C
Iteration 2 Helper created for thread C
Iteration 3 Helper created for thread C
Iteration 4 Helper created for thread C

STATIC THREAD LOCAL:
Iteration 0 Helper created for thread A
Iteration 1
Iteration 2
Iteration 3
Iteration 4
Iteration 0 Helper created for thread B
Iteration 1
Iteration 2
Iteration 3
Iteration 4
Iteration 0 Helper created for thread C
Iteration 1
Iteration 2
Iteration 3
Iteration 4

As you can see, one Helper is created for each thread, but by using static thread locals, it is only created once per thread.

But you said heap allocations!

I did. In the first example, I allocate the Helper on the stack. This might be a performance hit in itself, at least if the Helper constructor or destructor is expensive. What is far worse however is if Helper is allocated on the heap, or if any object it creates are (or any objects created by the ones it creates etc.).

To be thread safe, operator new needs to lock the entire heap. That means all other threads that need to access the heap block every time a thread allocates something on the heap. This can ruin your parallel performance. Andrew Binstock wrote very interesting article in Dr. Dobbs Go Parallel about this, which I highly recommend.

But, but, memory leak!
Nicely spotted. There is a memory leak in my program, since I never delete the Helpers created in process_threadlocal. This might or might not matter, depending on your situation. The program will never leak more than one object per thread, so you might just ignore the problem as memory is freed at the exit of the program anyway.

If Helper needs to free some other resource however, or do something else in its destructor, one thing you could do is wrap the Helper* pointer in an unique_ptr. Then the destructor will be called for each Helper when the program terminates and the static thread_local unique_ptr<Helper> is destroyed.

The reason I am not doing that in this example is that GCC doesn’t support non trivial thread local objects yet (It doesn’t have C++11 thread locals, but uses its own __thread which I have defined thread_local to).

And finally, the usual disclaimer: Don’t go optimizing like this unless you have profiler data to back that decision. Until proven otherwise, readable code always wins!

As usual, the code for this blog post is available on GitHub. I have only tested it with GCC, but I guess it should work in Visual Studio 2011 as well.

If you enjoyed this post, you can subscribe to my blog, or follow me on Twitter.

A String Literal is not a `string`

The other day, I discovered a bug in some code that can be simplified to something like this:

A library of functions that handles a lot of different types:

void doSomethingWith(int i)           { cout << "int"    << endl; };
void doSomethingWith(double d)        { cout << "double" << endl; };
void doSomethingWith(const string& s) { cout << "string" << endl; };
void doSomethingWith(const MyType& m) { cout << "MyType" << endl; };

Used like this:

    doSomethingWith(3);
    doSomethingWith("foo");

It of course outputs:

int
string

Then someone wanted to handle void pointers as well, and added this function to the library:

void doSomethingWith(const void* i) { cout << "void*" << endl; };

What is the output now? Make up your mind before looking.

int
void*

What happened? Why did C++ decide to use the const void * function instead of const string& that we wanted it to use?

The type of a string literal is not string, but const char[]. When deciding on the overloaded function to use, C++ will first see if any of them can be used directly. A const void* is the only type in our example than can point directly to the const char[], so that one is picked.

Before that function was introduced, none of the functions could be used directly, as neither const string& nor const MyType& can refer to a const char[], and it cannot be cast to an int or a double. C++ then looked for implicit constructors that could convert the const char[] into a usable type, and found std::string::string(const char * s). It then went on to create a temporary std::string object, and passed a reference to this object to void doSomethingWith(const string& s), like this:

doSomethingWith(std::string("foo"))

But then, when the const void* version appeared as an alternative, it preferred to use that one instead as it could be used without constructing any temporary objects.

As usual, the code for this blog post is available on GitHub.

If you enjoyed this post, you can subscribe to my blog, or follow me on Twitter.

	Anders Schau Knatten on Microsoft C++ versions ex…
	Anonymous on A prvalue is not a tempor…
	A prvalue is not a t… on A prvalue is not a tempor…
	A prvalue is not a t… on lvalues, rvalues, glvalues, pr…
	A prvalue is not a t… on A prvalue is not a tempor…

1: Reducing the number of includes

Reducing the content of includes

Installation

Test types

Assertions

Some final tips

Source Code

Installation

Setup for Your Project

1: Reducing the number of `include`s