Zero-overhead test doubles (mocks without virtual)


In which I explain why some coders are opposed to unit-testing with test-doubles due to the overhead of virtual functions, admit that they are sometimes right, and present a zero-overhead solution.

(Sorry if this posts looks like a lot of code. It’s not as bad as it looks though, the three examples are almost identical.)

It is not uncommon to find coders shying away from virtual functions due to the performance overhead. Often they are exaggerating, but some actually have the profiler data to prove it. Consider the following example, with a simulator class that calls the model heavily in an inner loop:

    class Model {
    public:
        double getValue(size_t i);
        size_t size();
    };

    class Simulator {
        Model* model;
    public:
        Simulator() : model(new Model()) {}

        void inner_loop() {
            double values[model->size()];
            while (simulate_more) {
                for (size_t i = 0; i < model->size(); ++i) {
                    values[i] = model->getValue(i);
                }
                doStuffWith(values);
            }
        }
    };

Imagine that getValue() is a light-weight function, light-weight enough that a virtual indirection would incur a noticeable performance-hit. Along comes the TDD-guy, preaching the values of unit testing, sticking virtual all over the place to facilitate test doubles, and suddenly our simulator is slower than the competition.

Here’s how he would typically go about doing it (the modifications are marked with comments):

    class Model {
    public:
        virtual double getValue(size_t i); //<--- virtual
        virtual size_t size();
    };

    class Simulator {
        Model* model;
    public:
        Simulator(Model* model) : model(model) {} //<--- inject dependency on Model

        void inner_loop() {
            double values[model->size()];
            while (simulate_more) {
                for (size_t i = 0; i < model->size(); ++i) {
                    values[i] = model->getValue(i);
                }
                doStuffWith(values);
            }
        }
    };

Now that the methods in Model are virtual, and the Model instance is passed in to the Simulator constructor, it can be faked/mocked in a test, like this:

    class FakeModel : public Model {
    public:
        virtual double getValue(size_t i);
        virtual size_t size();
    };

    void test_inner_loop() {
        FakeModel fakeModel;
        Simulator simulator(&fakeModel);
        //Do test
    }

Unfortunately, the nightly profiler build complains that our simulations now run slower than they used to. What to do?

The use of inheritance and dynamic polymorphism is actually not needed in this case. We know at compile time whether we will use a fake or a real Model, so we can use static polymorphism, aka. templates:

    class Model {
    public:
        double getValue(size_t i); //<--- look mom, no virtual!
        size_t size();
    };

    template <class ModelT> //<--- type of model as a template parameter
    class Simulator {
        ModelT* model;
    public:
        Simulator(ModelT* model) : model(model) {} //<--- Model is still injected

        void inner_loop() {
            double values[model->size()];
            while (simulate_more) {
                for (size_t i = 0; i < model->size(); ++i) {
                    values[i] = model->getValue(i);
                }
                doStuffWith(values);
            }
        }
    };

We have now parameterized Simulator with the Model type, and can decide at compile time which one to use, thus eliminating the need for virtual methods. We still need to inject the Model instance though, to be able to insert the fake in the test like this:

    class FakeModel { //<--- doesn't need to inherit from Model, only implement the methods used in inner_loop()  
        double getValue(size_t i);
        size_t size();
    };

    void test_inner_loop() {
        FakeModel fakeModel; 
        Simulator<FakeModel> simulator(&fakeModel);
        //Do test
    }

That’s it, all the benefits of test doubles, with zero performance overhead. There is a drawback however, in that we end up with templated code that wouldn’t need to be if it wasn’t for the purpose of testing. Not an ideal solution, but if both correctness and speed is important, and you have profiler data to prove the need, this is probably the way to go.

As usual, the sourcecode used in this post is available on GitHub.

If you enjoyed this post, you can subscribe to my blog, or follow me on Twitter. Also, if you write code-heavy posts like this in vim and want to automate copying in code snippets from an external file, check out my vim-plugin SnippetySnip (also available on GitHub).

Static and Dynamic Polymorphism – Siblings That Don’t Play Well Together


Static and dynamic polymorphism might sound related, but do not play well together, a fact I had to deal with l this week.

This is the situation: We have a set of worker classes, that work on a set of data classes. The worker classes can do different types of work, but the type of work is always known at compile time, so they were created using static polymorphism, aka templates.

template &lt;typenmame WorkImpl&gt;
class Worker {
...
};

WorkImpl decides the details of the work, but the algorithm is defined in Worker, and is always the same.

Those are the workers. We then have a pretty big tree of data classes. What kind of objects we need to deal with is not known until runtime, so we are using inheritance and virtual methods for these.

  A
 / \
B   C
|
D 

etc.

A has a virtual method fill() that gets passed a WorkImpl object, and fills it with data. fill() needs to be implemented in all the subclasses of A, to fill it with their special data. Bascially, what we want is to say

class A { 
  template  &lt;typename WorkImpl&gt;
  virtual fill(WorkImpl&amp; w) {...} //overridden in B, C, D etc.
};

Can you spot the problem?

Hint: When will the templated fill() methods be instantiated?

There is no way for the compiler to know that classes A-D will ever have their fill()-methods called, thus the fill()-methods will never be instantiated!

One solution to this problem is to make one version of fill() per type of WorkImpl, in each data class A-D. The body of a fill() method is independent of the type of WorkImpl, so this solution would lead to a lot of unnecessary duplication.

The solution we ended up with was creating wrapper methods that take a specific WorkImpl, and just calls fill() with that argument:

class A {
  virtual fillWrapper(OneWorkImpl&amp; w) { fill(w) };
  virtual fillWrapper(AnotherWorkImpl&amp; w) { fill(w) };
  template &lt;typename WorkImpl&gt;
  virtual fill(WorkImpl&amp; w) {...}
}

In this way we minimize code duplication, and still make sure the fill() methods are always instantiated.

Question: Can we get away with defining the wrappers in A only? And if not, when will it fail?

Answer: We can not. If we only define the wrappers in A, fill() will never be instantiated in B-D. Everything compiles and links, but at runtime, it is the A version that will be called for all the classes, leading to a wrong result.

Finally, I should mention that another option would have been to refactor the worker classes to use dynamic polymorphism, which would basically solve everything, and allow us to get rid of the wrapper functions. But that would have been a bigger job, which we didn’t have time for.

The Simplest Possible Introduction to Traits


This is the simplest introduction to traits I could think of:

A trait is information about a type stored outside of the type.

Imagine you are making a templated algorithm, and want to use different policies for different types. If you wrote those types, you could implement the policy selection in your types directly, but this is generally not possible for built in types and types you haven’t created yourself. Traits to the rescue!

Given

template <class T>
void f(T var) {
    bool advanced = policy_trait<T>::advanced;
    if ( advanced )
        cout << "Do it fancy!" << endl;
    else
        cout << "KISS." << endl;
}

you can define policies for whatever type you want by creating template specializations of policy_trait

template <>
struct policy_trait<char> {
    static const bool advanced = false;
};

That’s it! :)

(The original paper on traits is Traits: a new and useful template technique by Nathan C. Myers, but An introduction to C++ Traits by Thaddaeus Frogley is probably a more intuitive introduction.)

Why Algorithms Need two Iterators


When using algorithms in C++, such as find(), sort(), etc., you might wonder why you need to specify both the beginning and the end of the container, and not just a reference to the container itself. See for instance:

vector<int> v; //Assume it also has some elements
sort(v.begin(), v.end());

Wouldn’t it be better if we could just write

sort(v);

and let sort call begin() and end()?

If this was the case, algorithms would only work on containers that define begin() and end(). Algorithms in C++ do however work on any type that has an iterator. And an iterator is not a type, it is pure abstraction. Anything that behaves like an iterator is an iterator. If a type has a concept of the element currently pointed to, pointing to the next element and equality, it can be an iterator.

One important example is pointers. C++ algorithms work with standard arrays, because pointers behave as iterators. It would be impossible to do this:

int a[42]; //
sort(a);

How would sort get a hold of iterators (pointers) to the beginning and end of the array? The beginning is easy, but since the length of the array is lost when passed to a function, the end is not in sight. For this to work, one would have to have special versions of all algorithms, taking the size of the array as an extra argument. This would lead to a loss of generality, and double the amount of algorithms.

Passing two iterators however preserves generality, and can be used with anything that behaves like an iterator.

Note that algorithms are not defined to take a specific iterator type, like this:

void sort(iterator begin, iterator end); //All iterators inherit from iterator

Instead, they are defined like this:

template <class It>
void sort(It begin, It end); //The type of It can be whatever

First, this lets us use pointers as iterators. Pointers of course do not inherit from an iterator type. Second, this relieves us of runtime polymorphism (which could negatively impact both speed and size). If you try use a type that does not provide the necessary iterator operators, the compiler will catch it when sort() tries to use them.

This is typical for templates; with compile time polymorphism, anything that provides the interface used by the templated function is acceptable, regardless of what (if anything) it inherits from. This is often referred to as duck typing: “when I see a bird that walks like a duck and swims like a duck and quacks like a duck, I call that bird a duck.” This is also what allows you to make a vector of any type. Note that this is different from for instance Java, which uses inheritance for collections instead of duck typing. In Java, all classes inherit from Object, and you can only have collections of objects, not primitive types.