Don’t be Afraid of Returning by Value, Know the Return Value Optimization

In which I argue you shouldn’t be afraid of returning even large objects by value.

If you have somewhat large collections of somewhat large objects in a performance-critical application, which of the following functions would you prefer?

void getObjects(vector<C>& objs);
vector<C> getObjects();

The first version looks faster, right? After all, the second one returns a copy of the vector, and to do that, all the elements have to be copied. Sounds expensive! Better then, to pass inn a reference to a vector that is filled, and avoid the expensive return.

The second version is however easier to use, since it communicates more clearly what it does, and does not require the caller to define the vector to be filled. Compare

    doSomethingWith(getObjects());

against the more cubmersome

    vector<C> temp;
    getObjects(temp);
    doSomethingWith(temp);

Sounds like a classic tradeoff between speed and clarity then. Except it isn’t! Both functions incur the exact same number of copies, even on the lowest optimization levels, and without inlining anything. How is that possible? The answer is the Return Value Optimization (RVO), which allows the compiler to optimize away the copy by having the caller and the callee use the same chunk of memory for both “copies”.

If you got the point, and take my word for it, you can stop reading now. What follows is a somewhat lengthy example demonstrating the RVO being used in several typical situations.

Example
Basically, I have a class C, which counts the times it is constructed or copy constructed, and a library of functions that demonstrate slightly different ways of returning instances of C.

Here are the getter functions:

C getTemporaryC() {
	return C();
}

C getLocalC() {
	C c;
	return c;
}

C getDelegatedC() {
	return getLocalC();
}

vector<C> getVectorOfC() {
	vector<C> v;
	v.push_back(C());
	return v;
}

I then call each of these functions, measuring the number of constructors and copy constructors called:

int main() {
	C c1;
	print_copies("1: Constructing");

	C c2(c1);
	print_copies("2: Copy constructing");

	C c3 = getTemporaryC();
	print_copies("3: Returning a temporary");

	C c4 = getLocalC();
	print_copies("4: Returning a local");

	C c5 = getDelegatedC();
	print_copies("5: Returning through a delegate");

	vector<C> v = getVectorOfC();
	print_copies("6: Returning a local vector");
}

Update: I used gcc 4.5.2 to test this. Since then, people have tested using other compilers, getting less encouraging results. Please see the comments, and the summary table near the end.

This is the result:

1: Constructing used 0 copies, 1 ctors. 2: Copy constructing used 1 copies, 0 ctors. 3: Returning a temporary used 0 copies, 1 ctors. 4: Returning a local used 0 copies, 1 ctors. 5: Returning through a delegate used 0 copies, 1 ctors. 6: Returning a local vector used 1 copies, 1 ctors.

Discussion
1 and 2 are just there to demonstrate that the counting works. In 1, the constructor is called once, and in 2 the copy constructor is called once.

Then we get to the interesting part; In 3 and 4, we see that returning a copy does not invoke the copy constructor, even when the initial C is allocated on the stack in a local variable.

Then we get to 5, which also returns by value, but where the initial object is not allocated by the function itself. Rather, it gets its object from calling yet antother function. Even this chaining of methods doesn’t defeat the RVO, there is still not a single copy being made.

Finally, in 6, we try returing a container, a vector. Aha! A copy was made! But the copy that gets counted is made by vector::push_back(), not by returning the vector. So we see that the RVO also works when returning containers.

A curious detail
The normal rule for optimization used by the C++ standard is that the compiler is free to use whatever crazy cheating tricks it can come up with, as long as the result is no different from the non-optimized code. Can you spot where this rule is broken? In my example, the copy constructor has a side effect, incrementing the counter of copies made. That means that if the copy is optimized away, the result of the program is now different with and without RVO! This it what makes the RVO different from other optimizations, in that the compiler is actually allowed to optimize away the copy constructor even if it has side effects.

Conclusion
This has been my longest post so far, but the conclusion is simple: Don’t be afraid of returning large objects by value! Your code will be simpler, and just as fast.

UPDATE: Several people have been nice enough to try the examples in various compilers, here is a summary of the number of copies made in examples 3-6:

Compiler	Local	Delegate	Vector	SUM	Contributed by
Clang 3.2.1	0	0	1	1	Anders S. Knatten
Embarcadero RAD Studio 10.1 U. 2 (clang) bcc32c/bcc64	0	0	1	1	Eike
GCC 4.4.5	0	0	1	1	Anders S. Knatten
GCC 4.5.2	0	0	1	1	Anders S. Knatten
GCC 4.5.2 -std=c++0x	0	0	1	1	Anders S. Knatten
GCC 4.6.4 -std=c++0x	0	0	1	1	Anders S. Knatten
GCC 4.7.3 -std=c++0x	0	0	1	1	Anders S. Knatten
Visual Studio 2008	0	0	1	1	Anders S. Knatten
Visual Studio 2010	0	0	1	1	Dakota
Visual Studio 2012	0	0	1	1	Dakota
Visual Studio 2013 Preview	0	0	1	1	Dakota
Visual Studio 2005	0	0	2	2	Dakota
IBM XL C/C++ for AIX, V10.1	0	0	2	2	Olexiy Buyanskyy
IBM XL C/C++ for AIX, V11.1 (5724-X13)	0	0	2	2	Olexiy Buyanskyy
IBM XL C/C++ for AIX, V12.1 (5765-J02, 5725-C72)	0	0	2	2	Olexiy Buyanskyy
Embarcadero RAD Studio 10.1 Update 2 (prev gen) bcc32	1	1	2	4	Eike
Embarcadero RAD Studio XE relase build	1	1	2	4	Rob
Sun C++ 5.8 Patch 121017-14 2008/04/16	1	1	2	4	Bruce Stephens
Sun C++ 5.11 SunOS_i386 2010/08/13	1	1	2	4	Asgeir S. Nilsen
Sun C++ 5.12 SunOS_sparc Patch 148506-18 2014/02/11	1	1	2	4	Olexiy Buyanskyy
Visual C++ 6 SP6 (Version 12.00.8804) [0-3]	1	1	2	4	Martin Moene
HP ANSI C++ B3910B A.03.85	1	2	2	5	Bruce Stephens

UPDATE 2: Thomas Braun has written a similar post, including more intricate examples and move semantics. Read it here (pdf).

You can download all the example code from this post at Github.

If you enjoyed this post, you can subscribe to my blog, or follow me on Twitter.

Use boost list_of if You Can’t Have Uniform Initialization Yet

In which I demonstrate how boost::assign::list_of simplifies initialization of containers.

I have previously blogged about how Uniform Initialization Simplifies Testing. In C++, you can initialize an array when defining it, but you can not initialize containers:

	int a[] = {1, 2, 3}; //OK
	vector<int> v = {1, 2, 3}; //Not OK

You have to resort to something like this:

	//Either
	vector<int> v;
	v.push_back(1);
	v.push_back(2);
	v.push_back(3);

	//Or
	int tmp[] = {1, 2, 3}; 
	vector<int> v2(tmp, tmp+3);

In C++0X, we will have Uniform Initialization to take care of this, but if you are not on a supported compiler yet, you can use boost::assign while you are waiting:

	vector<int> v3 = boost::assign::list_of(1)(2)(3);

This is, again, especially useful in testing. To translate the example from my last post from C++0X to C++98 with boost:

using boost::assign::list_of;
 
int count_sheep(const vector<string>& animals) {
	return count(animals.begin(), animals.end(), "sheep");
}
 
TEST(TestCountSheep, returns_zero_when_there_are_no_sheep) {
	ASSERT_EQ(0, count_sheep(list_of("pig")("cow")("giraffe"))); //here
}
	 
TEST(TestCountSheep, returns_all_sheep) {
	ASSERT_EQ(2, count_sheep(list_of("sheep")("cow")("sheep"))); //and here
}

To use boost::assign, make sure to #include <boost/assign.hpp>. It has other clever tricks as well, so be sure to check out the docs. Boost is a widely used collection of high quality libraries for C++, and can be downloaded here.

If you enjoyed this post, you can subscribe to my blog, or follow me on Twitter.

Modify Visibility in a Subclass to Allow Testing

In which I argue it might sometimes be useful to test private/protected methods, and demonstrate how to do it in a framework-independent way.

How do you test private/protected functions? The most common, and often correct answer is “Don’t!”. But sometimes it can be useful, especially when test-driving helper-methods. Example:

class Ohlson {
protected:
	int helper(); //We want to test this
public:
	int compute() {
		//(...)
		int n = helper();
		//(...)
	}
};


TEST(TestOhlson, helper_returns42) {
	Ohlson o;
	ASSERT_EQ(42, o.helper());
}

This will fail: error: ‘int Ohlson::helper()’ is protected. Many unittesting frameworks provide a way to work around this, like googletest’s FRIEND_TEST, but there is a way to get around this in normal C++, by subclassing to modify visibility.

In C++, a subclass can change the visibility of a derived function. This is not possible in Java or C# as far as I know, and to be honest it does sound somewhat dangerous. But C++ wasn’t made to keep you in a padded box with a helmet on, was it?

Here’s how you do it:

class OhlsonForTest : public Ohlson {
public:
	using Ohlson::helper;
};

And voilà, helper() is now public and can be tested:

TEST(TestOhlson, helper_returns42) {
	OhlsonForTest o;
	ASSERT_EQ(42, o.helper());
}

This trick can be useful in some situations, but when you are done, see if you might refactor your way out of the problem instead.

And I would be extremely sceptical to do anything like this in production code!

By the way, this trick is often useful in combination with last weeks technique on Making a Derived Class to Allow Testing an Abstract Class.

If you enjoyed this post, you can subscribe to this blog, or follow me on Twitter

Make a Derived Class to Allow Testing an Abstract Class

In which I show how best to test an abstract class, which normally wouldn’t be instantiatable.

Abstract classes are not instantiatable, so how do you test one?

class Abstract
{
public:
	int common();
	virtual int special() = 0;
};

class Concrete : public Abstract
{
public:
	int special();
};

Say you want to test Abstract::common(), like so:

TEST(TestAbstract, helper_returns42)
{
	Abstract abs; //cannot declare variable ‘abs’ to be of abstract type ‘Abstract’
	ASSERT_EQ(42, abs.common());
}

It’s not possible, because common() special() is pure virtual (it has no implementation in Abstract). So how do you test a class that cannot be instantiated? It can be tempting to test through a concrete subclass:

TEST(TestAbstract, common_returns42)
{
	Concrete abs; 
	ASSERT_EQ(42, abs.common());
}

But this is generally a bad idea, since Concrete might change the behaviour of common(). Even though it is not virtual, it operates under a different state. And even though you know it will work now, you don’t know how Concrete will change in the future, and then you don’t want tests that don’t have anything to do with Concrete to start failing. Also, creating a Concrete object when you are testing on Abstract is confusing to the reader.

Instead, I recommend you derive a class to be used just for testing:


class AbstractForTest : public Abstract {
	int special();
};

TEST(TestAbstract, common_returns42)
{
	AbstractForTest abs; 
	ASSERT_EQ(42, abs.common());
}

In this way you get rid of the unwanted dependency, and the test is easier to understand. As a final note, make sure you name your test-class something that clearly distinguishes it as a testing artifact, like I did above. Also make sure to keep it in your test code (e.g. TestAbstract.cpp), not in your production code (e.g. Abstract.h/.cpp).

	Anders Schau Knatten on Microsoft C++ versions ex…
	Anonymous on A prvalue is not a tempor…
	A prvalue is not a t… on A prvalue is not a tempor…
	A prvalue is not a t… on lvalues, rvalues, glvalues, pr…
	A prvalue is not a t… on A prvalue is not a tempor…