Zero-overhead test doubles (mocks without virtual)


In which I explain why some coders are opposed to unit-testing with test-doubles due to the overhead of virtual functions, admit that they are sometimes right, and present a zero-overhead solution.

(Sorry if this posts looks like a lot of code. It’s not as bad as it looks though, the three examples are almost identical.)

It is not uncommon to find coders shying away from virtual functions due to the performance overhead. Often they are exaggerating, but some actually have the profiler data to prove it. Consider the following example, with a simulator class that calls the model heavily in an inner loop:

    class Model {
    public:
        double getValue(size_t i);
        size_t size();
    };

    class Simulator {
        Model* model;
    public:
        Simulator() : model(new Model()) {}

        void inner_loop() {
            double values[model->size()];
            while (simulate_more) {
                for (size_t i = 0; i < model->size(); ++i) {
                    values[i] = model->getValue(i);
                }
                doStuffWith(values);
            }
        }
    };

Imagine that getValue() is a light-weight function, light-weight enough that a virtual indirection would incur a noticeable performance-hit. Along comes the TDD-guy, preaching the values of unit testing, sticking virtual all over the place to facilitate test doubles, and suddenly our simulator is slower than the competition.

Here’s how he would typically go about doing it (the modifications are marked with comments):

    class Model {
    public:
        virtual double getValue(size_t i); //<--- virtual
        virtual size_t size();
    };

    class Simulator {
        Model* model;
    public:
        Simulator(Model* model) : model(model) {} //<--- inject dependency on Model

        void inner_loop() {
            double values[model->size()];
            while (simulate_more) {
                for (size_t i = 0; i < model->size(); ++i) {
                    values[i] = model->getValue(i);
                }
                doStuffWith(values);
            }
        }
    };

Now that the methods in Model are virtual, and the Model instance is passed in to the Simulator constructor, it can be faked/mocked in a test, like this:

    class FakeModel : public Model {
    public:
        virtual double getValue(size_t i);
        virtual size_t size();
    };

    void test_inner_loop() {
        FakeModel fakeModel;
        Simulator simulator(&fakeModel);
        //Do test
    }

Unfortunately, the nightly profiler build complains that our simulations now run slower than they used to. What to do?

The use of inheritance and dynamic polymorphism is actually not needed in this case. We know at compile time whether we will use a fake or a real Model, so we can use static polymorphism, aka. templates:

    class Model {
    public:
        double getValue(size_t i); //<--- look mom, no virtual!
        size_t size();
    };

    template <class ModelT> //<--- type of model as a template parameter
    class Simulator {
        ModelT* model;
    public:
        Simulator(ModelT* model) : model(model) {} //<--- Model is still injected

        void inner_loop() {
            double values[model->size()];
            while (simulate_more) {
                for (size_t i = 0; i < model->size(); ++i) {
                    values[i] = model->getValue(i);
                }
                doStuffWith(values);
            }
        }
    };

We have now parameterized Simulator with the Model type, and can decide at compile time which one to use, thus eliminating the need for virtual methods. We still need to inject the Model instance though, to be able to insert the fake in the test like this:

    class FakeModel { //<--- doesn't need to inherit from Model, only implement the methods used in inner_loop()  
        double getValue(size_t i);
        size_t size();
    };

    void test_inner_loop() {
        FakeModel fakeModel; 
        Simulator<FakeModel> simulator(&fakeModel);
        //Do test
    }

That’s it, all the benefits of test doubles, with zero performance overhead. There is a drawback however, in that we end up with templated code that wouldn’t need to be if it wasn’t for the purpose of testing. Not an ideal solution, but if both correctness and speed is important, and you have profiler data to prove the need, this is probably the way to go.

As usual, the sourcecode used in this post is available on GitHub.

If you enjoyed this post, you can subscribe to my blog, or follow me on Twitter. Also, if you write code-heavy posts like this in vim and want to automate copying in code snippets from an external file, check out my vim-plugin SnippetySnip (also available on GitHub).

Static and Dynamic Polymorphism – Siblings That Don’t Play Well Together


Static and dynamic polymorphism might sound related, but do not play well together, a fact I had to deal with l this week.

This is the situation: We have a set of worker classes, that work on a set of data classes. The worker classes can do different types of work, but the type of work is always known at compile time, so they were created using static polymorphism, aka templates.

template &lt;typenmame WorkImpl&gt;
class Worker {
...
};

WorkImpl decides the details of the work, but the algorithm is defined in Worker, and is always the same.

Those are the workers. We then have a pretty big tree of data classes. What kind of objects we need to deal with is not known until runtime, so we are using inheritance and virtual methods for these.

  A
 / \
B   C
|
D 

etc.

A has a virtual method fill() that gets passed a WorkImpl object, and fills it with data. fill() needs to be implemented in all the subclasses of A, to fill it with their special data. Bascially, what we want is to say

class A { 
  template  &lt;typename WorkImpl&gt;
  virtual fill(WorkImpl&amp; w) {...} //overridden in B, C, D etc.
};

Can you spot the problem?

Hint: When will the templated fill() methods be instantiated?

There is no way for the compiler to know that classes A-D will ever have their fill()-methods called, thus the fill()-methods will never be instantiated!

One solution to this problem is to make one version of fill() per type of WorkImpl, in each data class A-D. The body of a fill() method is independent of the type of WorkImpl, so this solution would lead to a lot of unnecessary duplication.

The solution we ended up with was creating wrapper methods that take a specific WorkImpl, and just calls fill() with that argument:

class A {
  virtual fillWrapper(OneWorkImpl&amp; w) { fill(w) };
  virtual fillWrapper(AnotherWorkImpl&amp; w) { fill(w) };
  template &lt;typename WorkImpl&gt;
  virtual fill(WorkImpl&amp; w) {...}
}

In this way we minimize code duplication, and still make sure the fill() methods are always instantiated.

Question: Can we get away with defining the wrappers in A only? And if not, when will it fail?

Answer: We can not. If we only define the wrappers in A, fill() will never be instantiated in B-D. Everything compiles and links, but at runtime, it is the A version that will be called for all the classes, leading to a wrong result.

Finally, I should mention that another option would have been to refactor the worker classes to use dynamic polymorphism, which would basically solve everything, and allow us to get rid of the wrapper functions. But that would have been a bigger job, which we didn’t have time for.

Why Member Functions are not Virtual by Default


A common error in C++ that the compiler cannot catch for you, is getting a base function called instead of the expected overridden one because you forgot to make it virtual. Example:

#include <iostream>
using namespace std;

struct Base {
    void print() {
        cout << "Base" << endl;
    }
};

struct Derived : public Base {
    void print() {
        cout << "Derived" << endl;
    }
};

int main() {
    Base b;
    Derived d;
    b.print(); //Base
    d.print(); //Derived
    Base *bp = &d;
    bp->print(); //Base (Here we would like to see Derived)
}

In Java, this would print Derived, as methods are virtual by default. In C++ however, this is not so. But if we add virtual to the declaration, we get the result we are looking for:

(...)
struct Base {
    virtual void print() {
(...)
    b.print(); //Base
    d.print(); //Derived
    bp->print(); //Derived

Why is this so? Why can’t all member functions be virtual, just like in most other object oriented languages? There are a couple of reasons:

Do not pay for what you do not need

There is an important guideline in the design of C++, that a feature should not incur any overhead when it is not used. And polymorphism is impossible to achieve without at least a small overhead. The way this is usually implemented is with a table of function pointers for each class, called a virtual table, or just vtbl. In our example, Base and Derived would each have a table with one entry, pointing to Base::print() and Derived::print(), respectively. Note that there is only one vtbl per class, not per object, so this overhead is not large, but it is there.

In addition to a vtbl per class, every object has a pointer to the vtbl of its class, adding (usually) one word to each object. This is another small spacial overhead.

Finally, there is a small overhead in time as well, as functions get an extra level of indirection as they are resolved through the vtbl. This is however usually negligible.

All in all, the overhead is so small that one might challenge the non-virtual default. But still, a rule is a rule. And there is more!

Backward compatibility

C++ was designed to be as close to 100% compatible with C as possible. If you have an object of a really simple class*, you can pass it to a C function just as if it were an instance of a good old C struct, and be sure that it will be ok. Its size is just the combined size of its data members (plus padding, if any), and it can be copied with memcpy(). As soon as you add a virtual method, the size of the object is no longer the size expected by C, it has an extra magic word (or maybe more) that C won’t recognize, and it cannot be portably used with memcpy().

And a third one?

There is a third argument as well, arguing that virtual methods break encapsulation, and should be used with care. While I agree, other languages have solved this in a perfectly good manner, by having member functions virtual by default, but making it possible to disallow inheritance. Java does this by the final keyword. Which is why I don’t really count this as an argument against virtual-by-default.

So if you don’t want to pay for what you don’t use, and if you depend on C, you cannot have virtual-by-default.


* More specifically a Plain Old Data Structure (POD), which I won’t cover here.

 

Always catch exceptions by reference


Some code I am responsible for at work recently got exposed to a new environment, and suddenly started logging exceptions. After some investigation, my colleague Thomas and I felt pretty sure what was going on, except the exception we saw in the logs was of a different type than the one that was thrown by the suspected culprit. What was going on, could the error be some place else after all? Then we looked at some intermediate code in the call hierarchy, and found the following (as usual, simplified) code :

class Exception{};
class SpecialException : public Exception {};

try {
    throw SpecialException();
} catch (Exception e) {
    cout << "Caught" << typeid(e).name() << endl;
}

Spot the problem? catch (Exception e) results in slicing. We are catching by value, which results in copying the exception. But since we are catching Exception, not SpecialException, Exception‘s copy construcor is called, and we lose the SpecialException part of the object. Here is an illustration of the Exception copy constructor in action:

Exception copy constructor slices away the SpecialException part of the object

On the left side is the full SpecialException object, with its Exception part shown in green. The green line illustrates the copy constructor, which only lets the Exception part of the object throuh.

The right way to write this would be catch (Exception& e), which would result in a reference to the original SpecialException. Thanks to polymorphism, this program will now print “Caught SpecialException”, whereas the original example prints “Caught Exception”.

As a side note, always throw by value, not by pointer. That is, do throw Exception();, not throw new Exception();. After all, you want to be throwing an exception, not a pointer. And as always, there is no point in allocating on the heap if you don’t have to. Requiring people to clean up memory for you when catching is not very polite.