Why we should see an uptake in <algorithm> usage


With C++11 out, I think we should see an uptake in use of the good old std <algorithm>. Why?

A common thing to do in a program is to iterate over a container of objects, producing another container of other objects. Imagine for instance you have a vector of domain objects:

struct DomainObject
{
    string label;
};
vector<DomainObject> objects;

Now you want to produce a vector containing the labels of all your domain objects. This is the “classical” solution:

    vector<string> labels(objects.size());
    for (size_t i = 0; i < objects.size(); ++i)
        labels[i] = objects[i].label;


You can however instead use std::transform, which is more declarative, immune to Off-by-one errors, possibly more optimization friendly etc. This is how it looks:

    vector<string> labels(objects.size());
    transform(objects.begin(), objects.end(), labels.begin(), label_for);


The problem is however that you need a function / function object to provide as the last argument to transform. Here is the one I used:

string label_for(const DomainObject& obj)
{
    return obj.label;
}


This reduces locality, and makes the code harder to read. Unless the helper is sufficiently advanced that you would want to either reuse it a lot or test it, it would be better to be able to write it directly in the transform call. This is exactly what C++11 lambdas are good for, and where I’ll think we’ll see them used a lot:

    vector<string> labels(objects.size());
    transform(objects.begin(), objects.end(), labels.begin(), [](const DomainObject& o){return o.label;});


This isn’t a complete introduction to lambdas, but if you haven’t seen them before, here is a quick intro. Lambdas are just a fancy name for functions without a name. That means you can simply type them in directly where you’d normally call a function. [] means “anonymous function follows” (at least for the purposes of this article), and then you just type out any normal function body. Mine takes a reference to a DomainObject and returns its label, just like label_for() did.

Here is another example, using std::find_if to look for a specific element in a container:

    auto matched = find_if(objects.begin(), objects.end(), [](const DomainObject& o) { return o.label == "two"; });
    cout << matched->label << endl;


Notice the use of auto, another C++11 feature. It uses type inference to deduce the type of the variable by looking at the rest of the expression. Here it understands that you will be getting a vector<DomainObject>::iterator from find_if(), so there is no need for you to type that out.

As usual, the code for this blog post is available on GitHub.

If you enjoyed this post, you can subscribe to my blog, or follow me on Twitter.

12 thoughts on “Why we should see an uptake in <algorithm> usage

  1. I understand you intent of using lambda and std::transform algorithm. If the intention is just to copy the contents, we can just use “operator =” (or) use the copy constructor itself.

    vector labels(objects);
    (or)
    vector labels(objects.size());
    labels = objects;

    Good blog post by the way :)

    1. Thanks for your comment Mahesh. The intention here is not to copy the contents to a new vector of DomainObjects, but to make a new vector of strings, containing the labels of the DomainObjects.

      1. I didn’t notice `labels` type is std::vector. My bad. Also, I don’t see here an option of how to edit my comments. Allocator type for std::vector is missing in my earlier comment.

          1. Oops … Your blog comment section is eating away my allocator types. Probably it is thinking them as html tags and disregarding as it is not understanding them. What is the escape sequence character ?

  2. For trivial cases, I think the range-based for is often going to be easier to write than the algorithm + lambda combination. Consider:

    // whether or not this should be initialized with objects.size() is worthy of a separate blog entry
    vector labels;
    for(const auto& o : objects)
    {
    labels.push_back(o.label);
    }

    1. That series of push_backs could potentially be a lot slower than using transform, especially if we don’t redim() it first to avoid multiple reallocations during the for loop. (If you initialize it with size() and then do push_back you will get the wrong result btw.) I would argue that this is not premature optimization, it is just using the language. The compiler might be able to optimize this entire loop into the equivalent of a transform, but I don’t know if it will. I am at least sure it won’t in debug mode, which I’d argue is important as well (at least in my field).

      In addition to the performance overhead, I actually think the transform version is more declarative and easier to read, as it states what the programmers wants done, not how to do it.

      1. Doing a .reserve(objects.size()); followed by a series of push_backs should be no worse than construction with a size argument. The latter implies the creation of n default-constructed strings (n == objects.size()), the former does not. Doing successive push_backs without calling reserve() may or may not be less efficient, thus my open-ended comment.

        transform is typically just implemented as something like:

        template
        OutIt transform(InIt first, InIt last, OutIt result, Op op)
        {
        for ( ; first != last; ++first, ++result)
        *result = op(*first);
        return result;
        }

        I don’t see any reason why this would result in faster generated code than the range-based for loop. In the best case, transform’s function call will be inlined, making it essentially equivalent to the for loop. In a completely unoptimized build, transform will have the additional overhead of n function call invocations compared to a for loop.

        1. My objection to construction with a size argument was just because the result would be incorrect. But you are probably right that a reserve would incur the same cost.

          I also think you are right about the speed.

          I do however still think it is nice to use <algorithm>s in places like this, because of the more declarative code they result in.

          1. In any case, it’s nice to have both options available :)
            I see now that you did an entry on the range-based for a couple of years ago. At the moment, I don’t have a compiler that supports either, hopefully that will change in the near future!

            1. For playing around with C++11, both GCC and Visual C++ Express 2011 are free and have pretty good support. At work I am still using Visual Studio 2008, but at least we are in the process of converting our projects to VS2010, which has some support.

Leave a reply to Kent Cancel reply