Why we should see an uptake in <algorithm> usage

Anders Schau Knatten C++ May 4, 2012May 7, 2012 2 Minutes

With C++11 out, I think we should see an uptake in use of the good old std <algorithm>. Why?

A common thing to do in a program is to iterate over a container of objects, producing another container of other objects. Imagine for instance you have a vector of domain objects:

struct DomainObject
{
    string label;
};
vector<DomainObject> objects;

Now you want to produce a vector containing the labels of all your domain objects. This is the “classical” solution:

    vector<string> labels(objects.size());
    for (size_t i = 0; i < objects.size(); ++i)
        labels[i] = objects[i].label;

You can however instead use std::transform, which is more declarative, immune to Off-by-one errors, ~~possibly more optimization friendly~~ etc. This is how it looks:

    vector<string> labels(objects.size());
    transform(objects.begin(), objects.end(), labels.begin(), label_for);

The problem is however that you need a function / function object to provide as the last argument to transform. Here is the one I used:

string label_for(const DomainObject& obj)
{
    return obj.label;
}

This reduces locality, and makes the code harder to read. Unless the helper is sufficiently advanced that you would want to either reuse it a lot or test it, it would be better to be able to write it directly in the transform call. This is exactly what C++11 lambdas are good for, and where I’ll think we’ll see them used a lot:

    vector<string> labels(objects.size());
    transform(objects.begin(), objects.end(), labels.begin(), [](const DomainObject& o){return o.label;});

This isn’t a complete introduction to lambdas, but if you haven’t seen them before, here is a quick intro. Lambdas are just a fancy name for functions without a name. That means you can simply type them in directly where you’d normally call a function. [] means “anonymous function follows” (at least for the purposes of this article), and then you just type out any normal function body. Mine takes a reference to a DomainObject and returns its label, just like label_for() did.

Here is another example, using std::find_if to look for a specific element in a container:

    auto matched = find_if(objects.begin(), objects.end(), [](const DomainObject& o) { return o.label == "two"; });
    cout << matched->label << endl;

Notice the use of auto, another C++11 feature. It uses type inference to deduce the type of the variable by looking at the rest of the expression. Here it understands that you will be getting a vector<DomainObject>::iterator from find_if(), so there is no need for you to type that out.

As usual, the code for this blog post is available on GitHub.

If you enjoyed this post, you can subscribe to my blog, or follow me on Twitter.

Discover more from C++ on a Friday

Subscribe to get the latest posts sent to your email.

Published by Anders Schau Knatten

Computer Programmer since 1995. View all posts by Anders Schau Knatten

Published May 4, 2012May 7, 2012

12 thoughts on “Why we should see an uptake in <algorithm> usage”

Mahesh says:

May 4, 2012 at 14:19

I understand you intent of using lambda and std::transform algorithm. If the intention is just to copy the contents, we can just use “operator =” (or) use the copy constructor itself.

vector labels(objects);
(or)
vector labels(objects.size());
labels = objects;

Good blog post by the way :)

Reply
1. Anders Schau Knatten says:
  
  May 4, 2012 at 14:32
  
  Thanks for your comment Mahesh. The intention here is not to copy the contents to a new vector of DomainObjects, but to make a new vector of strings, containing the labels of the DomainObjects.
  
  Reply
  1. Mahesh says:
    
    May 4, 2012 at 21:54
    
    I didn’t notice `labels` type is std::vector. My bad. Also, I don’t see here an option of how to edit my comments. Allocator type for std::vector is missing in my earlier comment.
    
    Reply
    1. Mahesh says:
      
      May 4, 2012 at 21:57
      
      `labels` is a type of std::vector
      
      Reply
      1. Mahesh says:
        
        May 4, 2012 at 22:01
        
        Oops … Your blog comment section is eating away my allocator types. Probably it is thinking them as html tags and disregarding as it is not understanding them. What is the escape sequence character ?
        
        Reply
        
        Anders Schau Knatten says:
        
        May 5, 2012 at 08:29
        
        It seems WordPress requires you to use < and >. If you write “<string>”, it will show as “<string>”.
        
        Reply
Kent says:

May 4, 2012 at 15:27

For trivial cases, I think the range-based for is often going to be easier to write than the algorithm + lambda combination. Consider:

// whether or not this should be initialized with objects.size() is worthy of a separate blog entry
vector labels;
for(const auto& o : objects)
{
labels.push_back(o.label);
}

Reply
1. Anders Schau Knatten says:
  
  May 5, 2012 at 08:39
  
  That series of push_backs could potentially be a lot slower than using transform, especially if we don’t redim() it first to avoid multiple reallocations during the for loop. (If you initialize it with size() and then do push_back you will get the wrong result btw.) I would argue that this is not premature optimization, it is just using the language. The compiler might be able to optimize this entire loop into the equivalent of a transform, but I don’t know if it will. I am at least sure it won’t in debug mode, which I’d argue is important as well (at least in my field).
  
  In addition to the performance overhead, I actually think the transform version is more declarative and easier to read, as it states what the programmers wants done, not how to do it.
  
  Reply
  1. Kent says:
    
    May 7, 2012 at 13:11
    
    Doing a .reserve(objects.size()); followed by a series of push_backs should be no worse than construction with a size argument. The latter implies the creation of n default-constructed strings (n == objects.size()), the former does not. Doing successive push_backs without calling reserve() may or may not be less efficient, thus my open-ended comment.
    
    transform is typically just implemented as something like:
    
    template
    OutIt transform(InIt first, InIt last, OutIt result, Op op)
    {
    for ( ; first != last; ++first, ++result)
    *result = op(*first);
    return result;
    }
    
    I don’t see any reason why this would result in faster generated code than the range-based for loop. In the best case, transform’s function call will be inlined, making it essentially equivalent to the for loop. In a completely unoptimized build, transform will have the additional overhead of n function call invocations compared to a for loop.
    
    Reply
    1. Anders Schau Knatten says:
      
      May 7, 2012 at 19:11
      
      My objection to construction with a size argument was just because the result would be incorrect. But you are probably right that a reserve would incur the same cost.
      
      I also think you are right about the speed.
      
      I do however still think it is nice to use <algorithm>s in places like this, because of the more declarative code they result in.
      
      Reply
      1. Kent says:
        
        May 10, 2012 at 16:55
        
        In any case, it’s nice to have both options available :)
        I see now that you did an entry on the range-based for a couple of years ago. At the moment, I don’t have a compiler that supports either, hopefully that will change in the near future!
        
        Reply
        
        Anders Schau Knatten says:
        
        May 12, 2012 at 08:54
        
        For playing around with C++11, both GCC and Visual C++ Express 2011 are free and have pretty good support. At work I am still using Visual Studio 2008, but at least we are in the process of converting our projects to VS2010, which has some support.
        
        Reply

	Anonymous on A prvalue is not a tempor…
	A prvalue is not a t… on A prvalue is not a tempor…
	A prvalue is not a t… on lvalues, rvalues, glvalues, pr…
	A prvalue is not a t… on A prvalue is not a tempor…
	A prvalue is not a t… on lvalues, rvalues, glvalues, pr…

Discover more from C++ on a Friday

Share this:

Related

Published by Anders Schau Knatten

12 thoughts on “Why we should see an uptake in <algorithm> usage”

Leave a reply to Kent Cancel reply