With C++11 out, I think we should see an uptake in use of the good old std <algorithm>
. Why?
A common thing to do in a program is to iterate over a container of objects, producing another container of other objects. Imagine for instance you have a vector
of domain objects:
struct DomainObject { string label; }; vector<DomainObject> objects;
Now you want to produce a vector containing the labels of all your domain objects. This is the “classical” solution:
vector<string> labels(objects.size()); for (size_t i = 0; i < objects.size(); ++i) labels[i] = objects[i].label;
You can however instead use std::transform
, which is more declarative, immune to Off-by-one errors, possibly more optimization friendly etc. This is how it looks:
vector<string> labels(objects.size()); transform(objects.begin(), objects.end(), labels.begin(), label_for);
The problem is however that you need a function / function object to provide as the last argument to transform
. Here is the one I used:
string label_for(const DomainObject& obj) { return obj.label; }
This reduces locality, and makes the code harder to read. Unless the helper is sufficiently advanced that you would want to either reuse it a lot or test it, it would be better to be able to write it directly in the transform
call. This is exactly what C++11 lambdas are good for, and where I’ll think we’ll see them used a lot:
vector<string> labels(objects.size()); transform(objects.begin(), objects.end(), labels.begin(), [](const DomainObject& o){return o.label;});
This isn’t a complete introduction to lambdas, but if you haven’t seen them before, here is a quick intro. Lambdas are just a fancy name for functions without a name. That means you can simply type them in directly where you’d normally call a function. []
means “anonymous function follows” (at least for the purposes of this article), and then you just type out any normal function body. Mine takes a reference to a DomainObject
and returns its label, just like label_for()
did.
Here is another example, using std::find_if
to look for a specific element in a container:
auto matched = find_if(objects.begin(), objects.end(), [](const DomainObject& o) { return o.label == "two"; }); cout << matched->label << endl;
Notice the use of auto
, another C++11 feature. It uses type inference to deduce the type of the variable by looking at the rest of the expression. Here it understands that you will be getting a vector<DomainObject>::iterator
from find_if()
, so there is no need for you to type that out.
As usual, the code for this blog post is available on GitHub.
If you enjoyed this post, you can subscribe to my blog, or follow me on Twitter.
I understand you intent of using lambda and std::transform algorithm. If the intention is just to copy the contents, we can just use “operator =” (or) use the copy constructor itself.
vector labels(objects);
(or)
vector labels(objects.size());
labels = objects;
Good blog post by the way :)
Thanks for your comment Mahesh. The intention here is not to copy the contents to a new vector of DomainObjects, but to make a new vector of strings, containing the labels of the DomainObjects.
I didn’t notice `labels` type is std::vector. My bad. Also, I don’t see here an option of how to edit my comments. Allocator type for std::vector is missing in my earlier comment.
`labels` is a type of std::vector
Oops … Your blog comment section is eating away my allocator types. Probably it is thinking them as html tags and disregarding as it is not understanding them. What is the escape sequence character ?
It seems WordPress requires you to use < and >. If you write “<string>”, it will show as “<string>”.
For trivial cases, I think the range-based for is often going to be easier to write than the algorithm + lambda combination. Consider:
// whether or not this should be initialized with objects.size() is worthy of a separate blog entry
vector labels;
for(const auto& o : objects)
{
labels.push_back(o.label);
}
That series of
push_back
s could potentially be a lot slower than usingtransform
, especially if we don’tredim()
it first to avoid multiple reallocations during the for loop. (If you initialize it withsize()
and then dopush_back
you will get the wrong result btw.) I would argue that this is not premature optimization, it is just using the language. The compiler might be able to optimize this entire loop into the equivalent of atransform
, but I don’t know if it will. I am at least sure it won’t in debug mode, which I’d argue is important as well (at least in my field).In addition to the performance overhead, I actually think the
transform
version is more declarative and easier to read, as it states what the programmers wants done, not how to do it.Doing a .reserve(objects.size()); followed by a series of push_backs should be no worse than construction with a size argument. The latter implies the creation of n default-constructed strings (n == objects.size()), the former does not. Doing successive push_backs without calling reserve() may or may not be less efficient, thus my open-ended comment.
transform is typically just implemented as something like:
template
OutIt transform(InIt first, InIt last, OutIt result, Op op)
{
for ( ; first != last; ++first, ++result)
*result = op(*first);
return result;
}
I don’t see any reason why this would result in faster generated code than the range-based for loop. In the best case, transform’s function call will be inlined, making it essentially equivalent to the for loop. In a completely unoptimized build, transform will have the additional overhead of n function call invocations compared to a for loop.
My objection to construction with a size argument was just because the result would be incorrect. But you are probably right that a
reserve
would incur the same cost.I also think you are right about the speed.
I do however still think it is nice to use
<algorithm>
s in places like this, because of the more declarative code they result in.In any case, it’s nice to have both options available :)
I see now that you did an entry on the range-based for a couple of years ago. At the moment, I don’t have a compiler that supports either, hopefully that will change in the near future!
For playing around with C++11, both GCC and Visual C++ Express 2011 are free and have pretty good support. At work I am still using Visual Studio 2008, but at least we are in the process of converting our projects to VS2010, which has some support.