Don’t Put All Includes in the .hxx


Ok, one more thing about headers, then I promise to talk about something else.

Just because a header file is called a header file, you shouldn’t include all your headers there. Often, a source file, where the definition of you functions and/or classes are, need quite a lot of headers to do its work. It might for instance need <algorithm> to do some sorting or searching, a fact your declarations don’t need to worry about:

processing.hxx

#include "MyType.hxx"

void process(MyType obj); 

processing.cc

#include "processing.hxx"
#include <algorithm>

void process(MyType obj) {
    std::find(...);
    std::sort(...);
}

You almost always need to include some headers in the .hxx, so why not put them all there? While compiling your .cc file, it doesn’t really matter, since all those files are included anyway. But when someone else includes your header, they suddenly depend on everything your implementation depends on. This also adds to compilation time for your users. So do your includes as listed above, don’t move

Use Your Head Before Using Using in Headers


Speaking of headers, did you ever think about what you are really doing when you put using namespace my_util in a header file? Sure, it might be annoying to namespace-qualify all the types in your declarations. But using namespace my_util forces all the names in my_util into the global namespace of everyone who includes your header file.

As usual, here is a small example:

my_iface.hxx:

bool check(my_types::Foo foo, my_types::Bar bar);
bool listen(my_types::Foo foo, my_types::Bar bar);
bool digest(my_types::Foo foo, my_types::Bar bar);

The repeated namespace qualifying gets annoying, so you would rather do:
my_iface.hxx:

using namespace my_types;
bool check(Foo foo, Bar bar);
bool listen(Foo foo, Bar bar);
bool digest(Foo foo, Bar bar);

This is however a bad idea. “using namespace my_types” (a using directive) exports all the types in my_types into the global namespace, which I think everyone agrees is a bad idea. You could argue that a using declaration is better, as “using my_types::Foo” only places Foo into the global namespace. I would however disagree, as a smaller sin is still a sin.

And you’re not getting away that easily, this argument is also valid for std! Yes, you need to type my_func(std::vector v) in header files.

A final trick you might try is to only use using namespace inside of your other namespaces, like so:
my_iface.hxx:

namespace my_iface {
  using namespace my_types;
  bool check(Foo foo, Bar bar);
  bool listen(Foo foo, Bar bar);
  bool digest(Foo foo, Bar bar);
}

This is still a bad idea. You are not polluting the global namespace any more, but you are polluting your own. There are at least two problems with this approach: First, someone who uses your libraries might now accidentally qualify Foo as my_iface::Foo. If you ever decide to clean up my_iface and remove the using directive, that poor guys code is going to break. Also, if this guy decides he wants to have using namespace my_iface somewhere in his code, he probably doesn’t want all your other namespaces (and std, boost or whatever) included as well.

As usual, normal politeness applies: A one-time annoyance for you is better than a repeated annoyance for all your users.

The Order of #include Directives Matter


I think most (good) programmers tend to prefer tidy code, having some habits or rules to stick to. One such rule/habit is the order of #include directives. While having a rule of thumb to stick to is good, some are better than other.

A very common one I think is to first include standard library headers (<iostream>), then third-party libraries (<boost/thread.hpp>) and finally local headers (“FizzBuzz.hpp”). While this might be the recommended way of doing it in for instance Python, it is not the best way to do it in C++. A colleague of mine just went through a lot of pain when swapping out a big library in their codebase. Why was that?

Imagine you are writing a library that depends on some other library, like the STL (hardly any library doesn’t). Imagine you are a good boy (or girl) and write the test first. You might do something like this:

#include <vector> //STL
#include <gtest/gtest.h> //ThirdParty 
#include "geometry.hpp" //In-house

#define PI 3.2 //As pr. Indiana Bill #246, 1897

TEST(TestGeometry, rotatingOrigoGivesOrigo) {
    std::vector<double> v(2,0);
    rotate(v, PI);
    EXPECT_EQ(0, v[0]) << "X got moved!";
    EXPECT_EQ(0, v[1]) << "Y got moved!";
}

and your geometry.hpp looks like this:

void rotate(std::vector<double>& v, double angle);

This all works out nicely until someone else wants to include geometry.hpp without including <vector> first, and get something like geometry.hpp:1: error: ‘vector’ is not a member of ‘std’. You have now forced all the users of your geometry library to include <vector> before including geometry.hpp.

While this can seem like a trivial example, this stuff quickly grows a lot hairier when you have a larger codebase with lots of dependencies. A header file A might depend on header B which depends on C, which declares types you have never heard of, and rightly so. And when you try to compile, you get error: ‘würkelschmeck’ is not a member of ‘std’. Or something a lot worse if templates are involved.

My suggested rule is:

  1. Local headers
  2. Third party headers
  3. STL headers

If you enjoyed this post, you can subscribe to my blog, or follow me on Twitter

The Simplest Possible Introduction to Traits


This is the simplest introduction to traits I could think of:

A trait is information about a type stored outside of the type.

Imagine you are making a templated algorithm, and want to use different policies for different types. If you wrote those types, you could implement the policy selection in your types directly, but this is generally not possible for built in types and types you haven’t created yourself. Traits to the rescue!

Given

template <class T>
void f(T var) {
    bool advanced = policy_trait<T>::advanced;
    if ( advanced )
        cout << "Do it fancy!" << endl;
    else
        cout << "KISS." << endl;
}

you can define policies for whatever type you want by creating template specializations of policy_trait

template <>
struct policy_trait<char> {
    static const bool advanced = false;
};

That’s it! :)

(The original paper on traits is Traits: a new and useful template technique by Nathan C. Myers, but An introduction to C++ Traits by Thaddaeus Frogley is probably a more intuitive introduction.)

Why You Shouldn’t Throw in Destructors


In my post two weeks ago, What’s the Point of Function Try Blocks, I mentioned that you shouldn’t throw in destructors. Why exactly is this?

Imagine the following:

void f() {
  MyClass c;
  functionThatMightThrow();
}

If functionThatMightThrow throws, stack unwinding occurs. What this means is basically that all the objects on the stack get destructed, from the innermost block and out. In this case, c is destroyed before the exception is thrown out of f.

But what happens if ~MyClass() throws? There would now be two simultaneously active exceptions, something that luckily is forbidden by the standard. Instead, terminate() is called, and the whole program stops.

So what can you do? Design around it. Log and swallow. Ignore. Abort. If your class is managing some resource that should be cleaned up, but failing to do so is not critical, you could also provide a close() method that throws on error. A client who feels it is necessary to be 100% sure the resource is cleaned up will then have the option of calling close() and handling the exception himself, before your destructor is called.

Note that relying on the client to remember to call close(), setUp(), init() etc. is generally a bad idea, but might be acceptable in some cases. What makes it acceptable in this case is if cleaning up the resource is optional.

Why Algorithms Need two Iterators


When using algorithms in C++, such as find(), sort(), etc., you might wonder why you need to specify both the beginning and the end of the container, and not just a reference to the container itself. See for instance:

vector<int> v; //Assume it also has some elements
sort(v.begin(), v.end());

Wouldn’t it be better if we could just write

sort(v);

and let sort call begin() and end()?

If this was the case, algorithms would only work on containers that define begin() and end(). Algorithms in C++ do however work on any type that has an iterator. And an iterator is not a type, it is pure abstraction. Anything that behaves like an iterator is an iterator. If a type has a concept of the element currently pointed to, pointing to the next element and equality, it can be an iterator.

One important example is pointers. C++ algorithms work with standard arrays, because pointers behave as iterators. It would be impossible to do this:

int a[42]; //
sort(a);

How would sort get a hold of iterators (pointers) to the beginning and end of the array? The beginning is easy, but since the length of the array is lost when passed to a function, the end is not in sight. For this to work, one would have to have special versions of all algorithms, taking the size of the array as an extra argument. This would lead to a loss of generality, and double the amount of algorithms.

Passing two iterators however preserves generality, and can be used with anything that behaves like an iterator.

Note that algorithms are not defined to take a specific iterator type, like this:

void sort(iterator begin, iterator end); //All iterators inherit from iterator

Instead, they are defined like this:

template <class It>
void sort(It begin, It end); //The type of It can be whatever

First, this lets us use pointers as iterators. Pointers of course do not inherit from an iterator type. Second, this relieves us of runtime polymorphism (which could negatively impact both speed and size). If you try use a type that does not provide the necessary iterator operators, the compiler will catch it when sort() tries to use them.

This is typical for templates; with compile time polymorphism, anything that provides the interface used by the templated function is acceptable, regardless of what (if anything) it inherits from. This is often referred to as duck typing: “when I see a bird that walks like a duck and swims like a duck and quacks like a duck, I call that bird a duck.” This is also what allows you to make a vector of any type. Note that this is different from for instance Java, which uses inheritance for collections instead of duck typing. In Java, all classes inherit from Object, and you can only have collections of objects, not primitive types.

What’s the Point of Function Try Blocks


This post is a follow up to The Returning Function that Never Returned, which I wrote a couple of months ago. You can read it first if you want, it is only around a hundred words, but this post can be read on its own as well.

In my previous post, I presented a function with try outside of the braces. This is called a “function try block”. The example went like this:

std::string foo() try {
    bar();
    return "foo";
} catch (...) {
    log("Unable to foo!")
}

This is completely useless, even dangerous. The try should be moved inside the function itself, like this:

std::string foo() {
    try {
        bar();
        return "foo";
    } catch (...) {
        log("Unable to foo!")
        //either rethrow or throw something else
    }
    //or make sure something is returned
}

So why were function try blocks introduced to the language in the first place? Have a look at this:

class A : public B{
    A(int x) : c(x){}
    C c;
}

How do you catch an exception thrown by Cs constructor? Having try inside the function block is too late. Initializing c inside the function block is not a solution either, since c will always be initialized before the body of the constructor, even if it is not mentioned in the initializer list. The only way to catch such an exception is to use a function try block on the constructor, and that is also why they were introduced. (Note that this argument is also valid if Bs constructor might throw.)

This is also the only sane case in which to use them. The other to candidates are regular functions and destructors, both of which I will get back to in a moment.

First, let’s have a look at what you can do with a function try block on a constructor. When the catch block reaches its end, it will rethrow the exception. It is impossible to swallow it. If you could swallow the exception, the code that tried to construct an object of this type would have no way of knowing that construction failed. A failed construction means the object doesn’t exist, and it doesn’t make sense to continue pretending nothing has happened. The only thing you can do is to throw an exception of another type, or cause a side effect (such as logging) and then retrhow. In particular, you cannot try to recover from the problem.

The same goes for destructors, you cannot swallow the exception. And since you really should avoid having destructors throw, using function try blocks on destructors is a bad idea.

Try blocks on regular functions behave a bit differently; if the end of the catch block is reached, the function will automatically return. But if you have a non-void function, this doesn’t make sense at all, as I mentioned in my previous post.

So in conclusion:

  1. Only use function try blocks for constructors.
  2. Don’t try to do anything else than rethrowing (possibly another type) or cause side effects like logging.

For a more in-depth discussion of this, have a look at Sutter’s Mill: Constructor Failures (or, The Objects That Never Were) by Herb Sutter. If you would like a recap of exception handling and constructor initializers thrown in, I recommend to start with Introduction to Function Try Blocks by Alan Nash.

Why Member Functions are not Virtual by Default


A common error in C++ that the compiler cannot catch for you, is getting a base function called instead of the expected overridden one because you forgot to make it virtual. Example:

#include <iostream>
using namespace std;

struct Base {
    void print() {
        cout << "Base" << endl;
    }
};

struct Derived : public Base {
    void print() {
        cout << "Derived" << endl;
    }
};

int main() {
    Base b;
    Derived d;
    b.print(); //Base
    d.print(); //Derived
    Base *bp = &d;
    bp->print(); //Base (Here we would like to see Derived)
}

In Java, this would print Derived, as methods are virtual by default. In C++ however, this is not so. But if we add virtual to the declaration, we get the result we are looking for:

(...)
struct Base {
    virtual void print() {
(...)
    b.print(); //Base
    d.print(); //Derived
    bp->print(); //Derived

Why is this so? Why can’t all member functions be virtual, just like in most other object oriented languages? There are a couple of reasons:

Do not pay for what you do not need

There is an important guideline in the design of C++, that a feature should not incur any overhead when it is not used. And polymorphism is impossible to achieve without at least a small overhead. The way this is usually implemented is with a table of function pointers for each class, called a virtual table, or just vtbl. In our example, Base and Derived would each have a table with one entry, pointing to Base::print() and Derived::print(), respectively. Note that there is only one vtbl per class, not per object, so this overhead is not large, but it is there.

In addition to a vtbl per class, every object has a pointer to the vtbl of its class, adding (usually) one word to each object. This is another small spacial overhead.

Finally, there is a small overhead in time as well, as functions get an extra level of indirection as they are resolved through the vtbl. This is however usually negligible.

All in all, the overhead is so small that one might challenge the non-virtual default. But still, a rule is a rule. And there is more!

Backward compatibility

C++ was designed to be as close to 100% compatible with C as possible. If you have an object of a really simple class*, you can pass it to a C function just as if it were an instance of a good old C struct, and be sure that it will be ok. Its size is just the combined size of its data members (plus padding, if any), and it can be copied with memcpy(). As soon as you add a virtual method, the size of the object is no longer the size expected by C, it has an extra magic word (or maybe more) that C won’t recognize, and it cannot be portably used with memcpy().

And a third one?

There is a third argument as well, arguing that virtual methods break encapsulation, and should be used with care. While I agree, other languages have solved this in a perfectly good manner, by having member functions virtual by default, but making it possible to disallow inheritance. Java does this by the final keyword. Which is why I don’t really count this as an argument against virtual-by-default.

So if you don’t want to pay for what you don’t use, and if you depend on C, you cannot have virtual-by-default.


* More specifically a Plain Old Data Structure (POD), which I won’t cover here.

 

New sourcecode formatting


Thanks to Alf, I have discovered the WordPress [sourcecode] tag. Where I would previously use <code> and <pre> and get the following result:

int true_random() {
    return 4; //Generated by fair roll of a die
}

I now use [sourcecode language="cpp"] ... [/sourcecode] and get the following:

int true_random() {
    return 4; //Generated by fair roll of a die
}

The syntax highlighting and line numbering is done client-side with JavaScript, but even without JavaScript the code looks nicely formatted with <pre>.

I have gone through all my older posts and updated them to use the new tag, please let me know if you see any spots I missed, or any errors I made in the process.

Puzzle #0: Call Sequence


Here’s a puzzle that should highlight a couple of interesting features in C++:

What is the output of the following program?

#include <iostream>;
using namespace std;

int main();

void term() {
    cout << "term()" << endl;
    main();
}

struct Positive {
    Positive(int i) {
        set_terminate(term);
        cout << "Positive::Positive(" << i << ")" << endl;
        if (i <= 0)
            throw main;
    }
};

Positive n(-1);

int main() {
    cout << "main()" << endl;
    Positive p(1);
}

If you just want the answer, you will find it at the bottom of this post. But first, I will go through the why.

Lines 1-2 are uninteresting. Lines 4-18 contain all the magic, but I’ll come back to those, I want to go through the program in chronological order as it is being executed. The entry point for this program is not main(), but rather the definition of Positive n(-1). This variable is global, and so will be initialized before main() is called. To initialize it, its constructor is called with i = -1, and so the real action starts on line 13.

On line 13, I use set_terminate() to set a termination handler. The termination handler will be called if a thrown exception is not handled. Otherwise, this statement has no visible immediate effect. We then encounter the first printout of this program, on line 14:

Positive::Positive(-1)

Our Positive class is however designed to only handle positive numbers, and throws on line 16. It is set to throw main, but this is really just a decoy. In C++, you can throw whatever you want, I happen to throw a function pointer to main(). In particular, this does not mean that main() is called.

The exception thrown on line 16 is never handled, and so the builtin terminate() calls the termination handler set_terminate(), which is defined on lines 6-9. Now we encounter our second printout:

term()

This is a straightforward cout on line 7. Then, we manually call main(). main is just a normal function, and even though it is special in that C++ will call it for you at startup, you are free to call it manually whenever you want. This is the third printout:

main()

main() then attempts to instantiate the local Positive object p(1). This calls the Positive constructor, which prints out

Positive::Positive(1)

This is a valid positive number, so no more exceptions are thrown. Control is returned to main(), which returns control to term(), which, being a termination handler, is not allowed to return, but still attempts to do so. I cannot find anything in the standard about what exactly is supposed to happen in this case, so I guess this is undefined behaviour. No matter what, returning doesn’t make any sense, as there is nowhere to return to. What happens on my system (gcc@linux) is program abortion with SIGABRT. What happens on yours?

Finally, the complete output of the program

$ ./outsidemain
Positive::Positive(-1)
term()
main()
Positive::Positive(1)
Aborted

This is probably not the most useful post I have written, but it highlights a few interesting points, and making a puzzle is always fun.