Efficient Pure Functional Programming in C++ Using Move Semantics

In which I briefly mention what pure functional programming is, explain why this can be slow in C++, and use move semantics to solve that problem. Be warned that this post is a bit longer than usual for this blog, and that it assumes more knowledge of C++11 than my posts usually do.

Pure functional programming is programming without state. In short, functions should not modify their input, not modify local or global state, but return new values instead. Often, we also want to avoid variables completely.

For instance, instead of doing:

void square(vector<int>& v) { /*multiply each element with itself*/ }

void imperative()
    vector<int> v = getNumbers();

we do

vector<int> squared(const vector<int>& v) { /*return a new vector with squared elements*/ }

void functional()

As you might already have noticed, this style results in more copies than the imperative style, which will lead to less efficient code. Depending on the problem you are solving, this might or might not matter, but one of the few real arguments for using C++ is that you have a problem that needs an efficient solution.

Let’s take a look at an example program, translate it from imperative to functional style, and finally use move semantics (explained later) to eliminate the performance hit. For our example, we will compute the root mean square of a function. Wikipedia has a good definition, but in short: Given a list of numbers, square each of them, take the mean of those squares, and return the square root of that number.

First, let’s define some helper functions:

double sum(const std::vector<double>& v)
    return std::accumulate(v.begin(), v.end(), 0, std::plus<double>());

double mean(const std::vector<double>& v)
    return sum(v) / v.size();

The square root function already exists as std::sqrt, so we don’t need to define that. The only missing piece is a function computing the square of a vector of numbers. In good old C++, there are two ways of doing this:

vector<int> square(vector<int> v); //(Or the argument could be a const reference)

void square(vector<int>& v);

The first version creates a copy of the vector, containing the squared number. This is perfect for functional code, but makes a copy that might be unnecessary. The second version modifies the argument, which is faster and often seen in imperative code. Since we assume we want efficiency, we’ll go with the second one in the examples that follow.

Here is the full, traditional, stateful solution, including square and a usage example:

    void square(std::vector<double>& v)
        std::transform(v.begin(), v.end(), v.begin(), [](double d) { return d*d; });

    double rms(std::vector<double>& v)
        double the_mean = mean(v);
        double the_rms = std::sqrt(the_mean);
        return the_rms;

    int main()
        std::vector<double> v = {1,2,3};
        double the_rms = rms(v);
        std::cout << the_rms << std::endl;
        return 0;

I won’t go into the full discussion of why functional programming is nice, but with no state there is a whole class of problems we never get into. How often have you not found a bug that was due to some code you didn’t know about mutating some seemingly unrelated state? Also, functions are a lot easier to both test and reason about when the only thing they care about is their input and output.

Let’s have a look at a pure functional alternative:

    std::vector<double> squared(std::vector<double> v)
        std::transform(v.begin(), v.end(), v.begin(), [](double d) { return d*d; });
        return std::move(v);

    double rms(const std::vector<double>& v)
        return std::sqrt(mean(squared(v)));

    int main()
        std::cout << rms({1,2,3}) << std::endl;
        return 0;

As you can see, no state, and no variables except for the function parameters, which we cannot really avoid. The implementation of rms even reads exactly like its mathematical definition, root(mean(squared(numbers))). There is however the issue of the copy being made when squared() is called. (No copy should be made when squared() returns though, due to the return value optimization).

Enter C++11’s move semantics. If you don’t know what move semantics and rvalue references are, the simplest explanation I have come across is this StackOverflow answer. You really should read it through, but if you’re short on time, here is the even quicker intro: If a squared(std::vector v) somehow knew that whoever called that function would never need the original v again, it could just “steal” it instead of making a copy. In the example above, {1,2,3} is a temporary object that can never be referred to again, so instead of copying it into squared, we could just have the internal storage of v point to the same place in memory. And this is exactly what happens with rvalue references.

Rvalue references are denoted by a double ampersand: squared(std::vector&& v), and can only bind to temporaries. Here is the pure functional example again, this time using rvalue references. This version completely avoids copying the vector:

    std::vector<double> squared(std::vector<double>&& v)
        std::transform(v.begin(), v.end(), v.begin(), [](double d) { return d*d; });
        return std::move(v);

    double rms(std::vector<double>&& v)
        return std::sqrt(mean(squared(std::move(v))));

    int main()
        std::cout << rms({1,2,3}) << std::endl;
        return 0;

{1,2,3} is a temporary, and so rms(std::vector&& v) is free to steal its resources without a deep copy. Inside of rms however, v itself is not a temporary. It is a named variable than can be referred to, and thus it would not be safe for squared(std::vector&& v) to steal its resources. We however know that we will never use it again after calling squared(v), so we tell the compiler to go ahead and steal anyway by wrapping it in std::move(). The same is done for the return value in squared().

In the functional examples, the vector that we wanted to compute the root mean square of was a temporary, thus allowing move semantics. But what if we wanted to call rms() from a non-pure part of the code base? Here’s what would happen if the original vector was a normal variable:

    std::vector<double> v = {1,2,3};
    //rms(v); //invalid initialization of reference of type ‘std::vector<double>&&’ from expression of type ‘std::vector<int>’

As you can see, we can no longer use the functional rms(). If we however know that we will not be using the original v again, we could wrap it in std::move() as we did earlier:


Finally, if that is not an option, we could manually make a copy:


In a pure functional program, our functional version of rms would be both fast and fitting the style we’re in. In an imperative program, it would stylistically be a worse fit, but we still wouldn’t loose any performance.

Finally, a few notes:

  • I have not considered to what degree copy elision would achieve the same benefits as move semantics for this example.
  • I decided not to complicate the examples using templates and universal references but that would be an interesting extension.
  • If you haven’t seen move semantics before; It is a general technique that has nothing to do with functional programming. I just think they make a great team.
  • There is a whole lot more to rvalues and move semantics than could be presented here, it is seen as one of the defining features of C++11.

As usual, the code for this blog post is available on GitHub.

If you enjoyed this post, you can subscribe to my blog, or follow me on Twitter.

18 thoughts on “Efficient Pure Functional Programming in C++ Using Move Semantics

  1. I don’t understand why the final versions of the functions take rvalue references. If they just took regular values, you’d have the same level of efficiency when using std::move to avoid copying.

    1. You are thinking of copy elision? I agree that copy elision probably would give the same level of efficiency for most of these examples. It all depends on the compiler though, as it is an optional optimization. With move semantics you are guaranteed the efficiency of moves.

      The thing is that I am not sure which compilers are able to do what with copy elision, and whether it might break down in bigger examples. That’s one of the reasons I said in the article that I did consider copy elision in this post. The other reason is that the post was way too long already. I would however be interested to learn more about this.

      Finally, in the example where we std::move() a local variable into rms(), I would guess copy elision wouldn’t work? Or do you think the compilers will be able to deduce that we will not use the variable again, and hand it directly over to rms() without copying?

      1. “The thing is that I am not sure which compilers are able to do what with copy elision, and whether it might break down in bigger examples.”

        I’ve been doing a little testing of this for my own library. I wrote a function called map(f,xs) that called std::transform and returned a new vector. I found that even a line like

        xs = map( plusTwo, std::move(xs) );

        could be optimized very well by GCC 4.7. The assembly was equivalent to just calling std::transform. This shows not only can it not copy the container, but it can follow its movement through the inlining process and come full circle to realize it can just do a transform.

        “Or do you think the compilers will be able to deduce that we will not use the variable again, and hand it directly over to rms() without copying?”

        Actually, this is a very common optimization. For example, if you had

        int x = 5;
        int y = x + 1;
        return y;

        Any compiler would certainly notice that x goes dead and y comes alive on the second line, so the value of y can be stored in the same place as x.

        I have not run any tests on whether GCC could do this for vectors, however.

        I’ve just been pooling over the assembly of your post. Your stateful, slow, and fast mains produce almost the same code. The slow version first copies the vector since you pass by ref, but other than that, avoids any more copies. The fast version is similar, but without copying at the start. The stateful version, I think, is the most efficient.

        Mostly, however, the only difference between the slow and fast versions is who does the copying. The caller or the function? The operation has to happen either way, so I’d actually argue that they have the same efficiency when considering its place in the larger program.

        Here’s the asm: https://gist.github.com/4012385

        1. “This shows not only can it not copy the container, but it can follow its movement through the inlining process and come full circle to realize it can just do a transform.”

          Cool! A short function like your map() is perfect for inlining, but did you look at what happened if the compiler only had access to its declaration?

          And thanks for looking into the assembly! I haven’t really looked at assembly code since the last century. It is definitely a habit I should get back into.

          “Mostly, however, the only difference between the slow and fast versions is who does the copying. The caller or the function? The operation has to happen either way”

          The slow version could be improved by taking the parameter by value, then copy elision would work for rvalues even when the compiler only has access to the declaration. So no copying would actually be needed.

          When it comes to optimizing copies, we had some interesting results in this post about the return value optimization https://blog.knatten.org/2011/08/26/dont-be-afraid-of-returning-by-value-know-the-return-value-optimization/ . Several readers tried my examples on different compilers, and it turns out the performance varies a lot. GCC came out on top, together with Visual Studio. It would be interesting to see the assembly of this post too on some other compilers.

  2. I have a specific question, do you think that this technique can be used to eliminate of two versions of the same function, one in-place and another with copy (for example this http://www.boost.org/doc/libs/1_52_0/doc/html/boost/algorithm/trim_copy.html vs. this http://www.boost.org/doc/libs/1_52_0/doc/html/boost/algorithm/trim.html). It always bothered me that you need to duplicate so much code, to keep the two options available, instead of letting the compiler figure out whether you need a copy or not.

      1. I don’t think so. This would require callers to support move semantics, which is not always possible. For instance:

        -The type you are working with does not have a move constructor / move assignment operator.
        -Speed is absolutely critical (e.g. an inner loop in a real time system). Even if you eliminate the copy, moving is still more expensive than passing a reference.
        -The caller of the function does not know about modern C++, prefers to not use these features, or for some other reason prefers an in place operation.

        1. Your answer makes me think that it is indeed possible because 1) if the object has a move assignment (for example now all STL containers have move assignment, in particular std::string probably (?) 2) Moving is far less expensive than copy (for the cases I have in mind), I don’t care too much that it is O(1) (thinking of containers) more expensive than reference passing, 3) suppose one doesn’t care about callers than don’t know C++11. Given this situation, what would be a universal function that exploits move to automatically modify in place or copy depending on the context?

          1. Sorry, I thought you asked if it would be possible to eliminate those two functions, and replace them with one using move semantics. That is what I argued is not possible, since not everyone would be able to use the new version. (Btw, In addition to my previous arguments, also consider legacy code, and other string classes outside the stl).

            You could however keep the two old versions, but in addition provide a new one using move semantics. For people able to use the new one, this would eliminate the need for the two old ones.

            Now to your final question, I am a bit confused. Exactly what context would determine whether to modify in-place or not? It would be easier to answer your question if you could come up with an example of how you would want to use the function from client code.

  3. Hi! I have tested this with both clang 3.4.2 and gcc 4.9.1, and I’ve got strange results. The functional code takes twice the time that the statefull one takes when using a test vector with 1e6 entries. It seems that at least one extra copy is being performed! Thanks.

    1. Hi, and thanks for your comment! I’m really sorry that I forgot to reply.

      Anyway, it seems like you’ve uncovered a bug in my example code. In the `squared()` function, `return v` was missing an `std::move()`. I tested this in Visual C++ with a vector 10E8 big, and saw that the functional code was 50% slower than the imperative code. After fixing the bug, the functional code is only 5% slower. I was also able to verify that it returned a copy using the debugger. I have updated the example code now.

      I guess this shows that it is always important to test your code! :)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s