In which I argue you shouldn’t be afraid of returning even large objects by value.
If you have somewhat large collections of somewhat large objects in a performance-critical application, which of the following functions would you prefer?
void getObjects(vector<C>& objs); vector<C> getObjects();
The first version looks faster, right? After all, the second one returns a copy of the vector, and to do that, all the elements have to be copied. Sounds expensive! Better then, to pass inn a reference to a vector that is filled, and avoid the expensive return.
The second version is however easier to use, since it communicates more clearly what it does, and does not require the caller to define the vector to be filled. Compare
doSomethingWith(getObjects());
against the more cubmersome
vector<C> temp; getObjects(temp); doSomethingWith(temp);
Sounds like a classic tradeoff between speed and clarity then. Except it isn’t! Both functions incur the exact same number of copies, even on the lowest optimization levels, and without inlining anything. How is that possible? The answer is the Return Value Optimization (RVO), which allows the compiler to optimize away the copy by having the caller and the callee use the same chunk of memory for both “copies”.
If you got the point, and take my word for it, you can stop reading now. What follows is a somewhat lengthy example demonstrating the RVO being used in several typical situations.
Example
Basically, I have a class C
, which counts the times it is constructed or copy constructed, and a library of functions that demonstrate slightly different ways of returning instances of C
.
Here are the getter functions:
C getTemporaryC() { return C(); } C getLocalC() { C c; return c; } C getDelegatedC() { return getLocalC(); } vector<C> getVectorOfC() { vector<C> v; v.push_back(C()); return v; }
I then call each of these functions, measuring the number of constructors and copy constructors called:
int main() { C c1; print_copies("1: Constructing"); C c2(c1); print_copies("2: Copy constructing"); C c3 = getTemporaryC(); print_copies("3: Returning a temporary"); C c4 = getLocalC(); print_copies("4: Returning a local"); C c5 = getDelegatedC(); print_copies("5: Returning through a delegate"); vector<C> v = getVectorOfC(); print_copies("6: Returning a local vector"); }
Update: I used gcc 4.5.2 to test this. Since then, people have tested using other compilers, getting less encouraging results. Please see the comments, and the summary table near the end.
This is the result:
1: Constructing used 0 copies, 1 ctors.
2: Copy constructing used 1 copies, 0 ctors.
3: Returning a temporary used 0 copies, 1 ctors.
4: Returning a local used 0 copies, 1 ctors.
5: Returning through a delegate used 0 copies, 1 ctors.
6: Returning a local vector used 1 copies, 1 ctors.
Discussion
1 and 2 are just there to demonstrate that the counting works. In 1, the constructor is called once, and in 2 the copy constructor is called once.
Then we get to the interesting part; In 3 and 4, we see that returning a copy does not invoke the copy constructor, even when the initial C
is allocated on the stack in a local variable.
Then we get to 5, which also returns by value, but where the initial object is not allocated by the function itself. Rather, it gets its object from calling yet antother function. Even this chaining of methods doesn’t defeat the RVO, there is still not a single copy being made.
Finally, in 6, we try returing a container, a vector
. Aha! A copy was made! But the copy that gets counted is made by vector::push_back()
, not by returning the vector. So we see that the RVO also works when returning containers.
A curious detail
The normal rule for optimization used by the C++ standard is that the compiler is free to use whatever crazy cheating tricks it can come up with, as long as the result is no different from the non-optimized code. Can you spot where this rule is broken? In my example, the copy constructor has a side effect, incrementing the counter of copies made. That means that if the copy is optimized away, the result of the program is now different with and without RVO! This it what makes the RVO different from other optimizations, in that the compiler is actually allowed to optimize away the copy constructor even if it has side effects.
Conclusion
This has been my longest post so far, but the conclusion is simple: Don’t be afraid of returning large objects by value! Your code will be simpler, and just as fast.
UPDATE: Several people have been nice enough to try the examples in various compilers, here is a summary of the number of copies made in examples 3-6:
Compiler | Temporary | Local | Delegate | Vector | SUM | Contributed by |
---|---|---|---|---|---|---|
Clang 3.2.1 | 0 | 0 | 0 | 1 | 1 | Anders S. Knatten |
Embarcadero RAD Studio 10.1 U. 2 (clang) bcc32c/bcc64 | 0 | 0 | 0 | 1 | 1 | Eike |
GCC 4.4.5 | 0 | 0 | 0 | 1 | 1 | Anders S. Knatten |
GCC 4.5.2 | 0 | 0 | 0 | 1 | 1 | Anders S. Knatten |
GCC 4.5.2 -std=c++0x | 0 | 0 | 0 | 1 | 1 | Anders S. Knatten |
GCC 4.6.4 -std=c++0x | 0 | 0 | 0 | 1 | 1 | Anders S. Knatten |
GCC 4.7.3 -std=c++0x | 0 | 0 | 0 | 1 | 1 | Anders S. Knatten |
Visual Studio 2008 | 0 | 0 | 0 | 1 | 1 | Anders S. Knatten |
Visual Studio 2010 | 0 | 0 | 0 | 1 | 1 | Dakota |
Visual Studio 2012 | 0 | 0 | 0 | 1 | 1 | Dakota |
Visual Studio 2013 Preview | 0 | 0 | 0 | 1 | 1 | Dakota |
Visual Studio 2005 | 0 | 0 | 0 | 2 | 2 | Dakota |
IBM XL C/C++ for AIX, V10.1 | 0 | 0 | 0 | 2 | 2 | Olexiy Buyanskyy |
IBM XL C/C++ for AIX, V11.1 (5724-X13) | 0 | 0 | 0 | 2 | 2 | Olexiy Buyanskyy |
IBM XL C/C++ for AIX, V12.1 (5765-J02, 5725-C72) | 0 | 0 | 0 | 2 | 2 | Olexiy Buyanskyy |
Embarcadero RAD Studio 10.1 Update 2 (prev gen) bcc32 | 0 | 1 | 1 | 2 | 4 | Eike |
Embarcadero RAD Studio XE relase build | 0 | 1 | 1 | 2 | 4 | Rob |
Sun C++ 5.8 Patch 121017-14 2008/04/16 | 0 | 1 | 1 | 2 | 4 | Bruce Stephens |
Sun C++ 5.11 SunOS_i386 2010/08/13 | 0 | 1 | 1 | 2 | 4 | Asgeir S. Nilsen |
Sun C++ 5.12 SunOS_sparc Patch 148506-18 2014/02/11 | 0 | 1 | 1 | 2 | 4 | Olexiy Buyanskyy |
Visual C++ 6 SP6 (Version 12.00.8804) [0-3] | 0 | 1 | 1 | 2 | 4 | Martin Moene |
HP ANSI C++ B3910B A.03.85 | 0 | 1 | 2 | 2 | 5 | Bruce Stephens |
UPDATE 2: Thomas Braun has written a similar post, including more intricate examples and move semantics. Read it here (pdf).
You can download all the example code from this post at Github.
If you enjoyed this post, you can subscribe to my blog, or follow me on Twitter.
I knew about the RVO without thinking much but I was always puzzled by how is it possible that it can optimize the copy constructor if the copy constructor *does* something. But your paragraph “curious detail” clarifies it. I wonder whether RVO is an optimization at all, since seems to be part of the language, as “in RVO-type situation the copy constructor must not be used”. Cool thanks.
Thanks for your comment alfC. I would indeed say it is an optimization, since “[the] implementation is permitted” to do this, it is not required to.
I’ve been making the same argument just recently, and my tests with gcc (on x86, x86_64) support it. However, running the same test using Sun Studio 10 (SPARC) shows that the compiler’s not applying RVO, even with -fast, -O. Admittedly it’s an old compiler, but not obscenely old (with stlport, it supports most of what we want to do with templates, exceptions, etc.).
So anyway, I’d suggest (as always) testing. It’s possible RVO won’t happen on one or other of your platforms.
Thanks a lot for your comment! I worked in Sun Studio 12 last year, but didn’t do any tests for RVO there unfortunately. And now I don’t have access to it any more.
Anyway, your point about testing is very good. If anyone else wants to test, feel free to download my little test-suite from github, as mentioned at the end of the article.
I got the same (negative) results for HP-UX (PA-RISC) and Windows. Both of those are using outdated compilers (Windows is VS 2003, I think), but even so. Disappointing. Roll on C++11 and rvalues and move constructors…
I’ll check out VS 2008 tomorrow.
C++11 might also help, especially for the vector-case, but I’m not sure when the necessary features will be available in Sun/Visual Studio.
Could you please post your exact results for the compilers you tried? I thought I’d make a summary table with all the data that is coming in.
You can download everything you need in a VM here. Solaris 11 Express with Studio 12.2 preinstalled. http://www.oracle.com/technetwork/articles/servers-storage-dev/vms11expstudio-howto-401051.html
I just completed the 3GB download and tested on it. Here are the results:
oracle@solaris_11X:~/blog.knatten.org/rvo$ CC -V
CC: Sun C++ 5.11 SunOS_i386 2010/08/13
usage: CC [ options ] files. Use ‘CC -flags’ for details
oracle@solaris_11X:~/blog.knatten.org/rvo$ CC=cc CXX=CC CXXFLAGS=-fast make
CC -fast -c -o c.o c.cpp
CC -fast -c -o lib.o lib.cpp
CC -fast -c -o lib2.o lib2.cpp
CC -fast -c -o main.o main.cpp
CC -fast -o rvo c.o lib.o lib2.o main.o
oracle@solaris_11X:~/blog.knatten.org/rvo$ ./rvo
1: Constructing used 0 copies, 1 constructors.
2: Copy constructing used 1 copies, 0 constructors.
3: Returning a temporary used 0 copies, 1 constructors.
4: Returning a local used 1 copies, 1 constructors.
5: Returning through a delegate used 1 copies, 1 constructors.
6: Returning a local vector used 2 copies, 1 constructors.
Thanks for doing that Asgeir! It is disappointing to see that the only copies they manage to optimize away are the simple temporaries in though.
My experiments were done using gcc version 4.5.2, I will try Visual Studio 2008 when I get to work tomorrow.
Another comment: The VM I downloaded also had GCC installed, and it produced the same results as your GCC trials.
Here are the result from Visual Studio 2008, showing that the RVO is applied in all cases:
1: Constructing used 0 copies, 1 constructors.
2: Copy constructing used 1 copies, 0 constructors.
3: Returning a temporary used 0 copies, 1 constructors.
4: Returning a local used 0 copies, 1 constructors.
5: Returning through a delegate used 0 copies, 1 constructors.
6: Returning a local vector used 1 copies, 1 constructors.
Here’s some results. (Neither is an up to date compiler. They just
happen to be what we’re using.)
CC: Sun C++ 5.8 Patch 121017-14 2008/04/16
-bash-3.00$ ./rvo
1: Constructing used 0 copies, 1 constructors.
2: Copy constructing used 1 copies, 0 constructors.
3: Returning a temporary used 0 copies, 1 constructors.
4: Returning a local used 1 copies, 1 constructors.
5: Returning through a delegate used 1 copies, 1 constructors.
6: Returning a local vector used 2 copies, 1 constructors.
aCC: HP ANSI C++ B3910B A.03.85
(Compilation with “aCC -AA”.)
-bash-3.1$ ./rvo
1: Constructing used 0 copies, 1 constructors.
2: Copy constructing used 1 copies, 0 constructors.
3: Returning a temporary used 0 copies, 1 constructors.
4: Returning a local used 1 copies, 1 constructors.
5: Returning through a delegate used 2 copies, 1 constructors.
6: Returning a local vector used 2 copies, 1 constructors.
Thanks Bruce,
I made a summary table now, with my gcc an VS data, as well as the data provided by Asgeir and yourself.
$ xlC -qversion
IBM XL C/C++ for AIX, V10.1
Version: 10.01.0000.0004
1: Constructing used 0 copies, 1 constructors.
2: Copy constructing used 1 copies, 0 constructors.
3: Returning a temporary used 0 copies, 1 constructors.
4: Returning a local used 0 copies, 1 constructors.
5: Returning through a delegate used 0 copies, 1 constructors.
6: Returning a local vector used 2 copies, 1 constructors.
$ xlC -qversion
IBM XL C/C++ for AIX, V11.1 (5724-X13)
Version: 11.01.0000.0010
$ ./rvo
1: Constructing used 0 copies, 1 constructors.
2: Copy constructing used 1 copies, 0 constructors.
3: Returning a temporary used 0 copies, 1 constructors.
4: Returning a local used 0 copies, 1 constructors.
5: Returning through a delegate used 0 copies, 1 constructors.
6: Returning a local vector used 2 copies, 1 constructors.
Visual Studio 2005 (for those of us still in the dark ages on some projects)
Release:
1: Constructing used 0 copies, 1 constructors.
2: Copy constructing used 1 copies, 0 constructors.
3: Returning a temporary used 0 copies, 1 constructors.
4: Returning a local used 0 copies, 1 constructors.
5: Returning through a delegate used 0 copies, 1 constructors.
6: Returning a local vector used 2 copies, 1 constructors.
Debug:
1: Constructing used 0 copies, 1 constructors.
2: Copy constructing used 1 copies, 0 constructors.
3: Returning a temporary used 0 copies, 1 constructors.
4: Returning a local used 1 copies, 1 constructors.
5: Returning through a delegate used 1 copies, 1 constructors.
6: Returning a local vector used 3 copies, 1 constructors.
And in case anyone is curious:
Visual Studio 2010
Release:
1: Constructing used 0 copies, 1 constructors.
2: Copy constructing used 1 copies, 0 constructors.
3: Returning a temporary used 0 copies, 1 constructors.
4: Returning a local used 0 copies, 1 constructors.
5: Returning through a delegate used 0 copies, 1 constructors.
6: Returning a local vector used 1 copies, 1 constructors.
Debug:
1: Constructing used 0 copies, 1 constructors.
2: Copy constructing used 1 copies, 0 constructors.
3: Returning a temporary used 0 copies, 1 constructors.
4: Returning a local used 1 copies, 1 constructors.
5: Returning through a delegate used 1 copies, 1 constructors.
6: Returning a local vector used 1 copies, 1 constructors.
Visual Studio 2012
Release:
1: Constructing used 0 copies, 1 constructors.
2: Copy constructing used 1 copies, 0 constructors.
3: Returning a temporary used 0 copies, 1 constructors.
4: Returning a local used 0 copies, 1 constructors.
5: Returning through a delegate used 0 copies, 1 constructors.
6: Returning a local vector used 1 copies, 1 constructors.
Debug:
1: Constructing used 0 copies, 1 constructors.
2: Copy constructing used 1 copies, 0 constructors.
3: Returning a temporary used 0 copies, 1 constructors.
4: Returning a local used 1 copies, 1 constructors.
5: Returning through a delegate used 1 copies, 1 constructors.
6: Returning a local vector used 1 copies, 1 constructors.
Visual Studio 2013 Preview
Release:
1: Constructing used 0 copies, 1 constructors.
2: Copy constructing used 1 copies, 0 constructors.
3: Returning a temporary used 0 copies, 1 constructors.
4: Returning a local used 0 copies, 1 constructors.
5: Returning through a delegate used 0 copies, 1 constructors.
6: Returning a local vector used 1 copies, 1 constructors.
Debug:
1: Constructing used 0 copies, 1 constructors.
2: Copy constructing used 1 copies, 0 constructors.
3: Returning a temporary used 0 copies, 1 constructors.
4: Returning a local used 1 copies, 1 constructors.
5: Returning through a delegate used 1 copies, 1 constructors.
6: Returning a local vector used 1 copies, 1 constructors.
Although NRVO and URVO will usually help you to avoid copies, they are not always applied.
For example if you have multiple possible return paths
———–
C URVO_complicated() {
const bool param = true;
return param ? C() : C();
}
C getLocalC() {
C c;
return c;
}
C NRVO_complicated() {
const bool param = true;
C c;
return param ? c : c;
}
———–
you get with g++ trunk
7: URVO complicated used 0 copies, 1 constructors.
7: NRVO complicated used 0 copies, 1 constructors.
and with clang++ trunk
7: URVO complicated used 0 copies, 1 constructors.
7: NRVO complicated used 1 copies, 1 constructors.
And these are still really easy cases!
At least I thought that the compiler has the power of realising that it is returning the identical object in both code paths.
Full code at https://github.com/t-b/blog.knatten.org/commit/bbf70a7d22ea3af0c9a4f6107d32643d7157c46b.
Huh, that’s interesting. I didn’t think whether the objects are identical would matter, only the type (so it can allocate on the caller’s stack).
I’m not sure how to include your example in the stats without breaking the ones already there though, do you have a suggestion?
Just to let everyone know, t-b has written a very interesting and much more elaborate post on this, which you can find here http://www.byte-physics.de/en/cpp-copy-elision.html (pdf)
Embarcadero RAD Studio XE relase build:
1: Constructing used 0 copies, 1 constructors.
2: Copy constructing used 1 copies, 0 constructors.
3: Returning a temporary used 0 copies, 1 constructors.
4: Returning a local used 1 copies, 1 constructors.
5: Returning through a delegate used 1 copies, 1 constructors.
6: Returning a local vector used 2 copies, 1 constructors.
Thanks! I put your data in the table.
Excellent article, however, I came up with a completely different testing scenario and I get copies every time I use return-by-value, but with pass-by-reference I get zero copies. Maybe I’m doing something wrong. My samplecode can be found here:
http://mfctips.com/2014/05/20/return-by-value-vs-pass-by-reference/
Microsoft Visual C++ 6 SP6 (Version 12.00.8804) [0-3]
Without optimisation and with maximum optimization:
> cl -nologo -W3 -EHsc -Od main.cpp c.cpp lib.cpp lib2.cpp && main
> cl -nologo -W3 -EHsc -Ox main.cpp c.cpp lib.cpp lib2.cpp && main
1: Constructing used 0 copies, 1 constructors.
2: Copy constructing used 1 copies, 0 constructors.
3: Returning a temporary used 0 copies, 1 constructors.
4: Returning a local used 1 copies, 1 constructors.
5: Returning through a delegate used 1 copies, 1 constructors.
6: Returning a local vector used 2 copies, 1 constructors.
VC6 only performs unnamed RVO (URVO).
[0] From the ages there was no light nor dark ;)
[1] Had to add #include <string> to main.cpp
[2] http://en.wikipedia.org/wiki/Microsoft_Visual_Studio#Visual_Studio_6.0_.281998.29
[3] It’s a challenge to write (modern) C++ with VC6, see http://martin-moene.blogspot.nl/search/label/VC6
At any rate it encourages one to keep things simple (complexity is added to let it work ;).
cheers,
Martin
I prefer “void getObjects(vector& objs);”.
If this is called in a loop, it never results in copies. RVO cannot help in this scenario.
RVO also heavily depends on the compile being smart enough.For performance critical code, you want something which always works.
Both are good points.
$ /bb/util/common/SS12_3-20131030/SUNWspro/bin/CC -V
CC: Sun C++ 5.12 SunOS_sparc Patch 148506-18 2014/02/11
$ ./rvo
1: Constructing used 0 copies, 1 constructors.
2: Copy constructing used 1 copies, 0 constructors.
3: Returning a temporary used 0 copies, 1 constructors.
4: Returning a local used 1 copies, 1 constructors.
5: Returning through a delegate used 1 copies, 1 constructors.
6: Returning a local vector used 2 copies, 1 constructors.
Thanks! I have updated the post with your numbers.
$ xlC -qversion
IBM XL C/C++ for AIX, V12.1 (5765-J02, 5725-C72)
Version: 12.01.0000.0012
$ ./rvo
1: Constructing used 0 copies, 1 constructors.
2: Copy constructing used 1 copies, 0 constructors.
3: Returning a temporary used 0 copies, 1 constructors.
4: Returning a local used 0 copies, 1 constructors.
5: Returning through a delegate used 0 copies, 1 constructors.
6: Returning a local vector used 2 copies, 1 constructors.
Thanks Olexiy!
Embarcadero RAD Studio 10.1 Update 2
(previous-generation) bcc32 compiler:
1: Constructing used 0 copies, 1 constructors.
2: Copy constructing used 1 copies, 0 constructors.
3: Returning a temporary used 0 copies, 1 constructors.
4: Returning a local used 1 copies, 1 constructors.
5: Returning through a delegate used 1 copies, 1 constructo
6: Returning a local vector used 2 copies, 1 constructors.
(Clang-enhanced) bcc32c and bcc64 compiler:
1: Constructing used 0 copies, 1 constructors.
2: Copy constructing used 1 copies, 0 constructors.
3: Returning a temporary used 0 copies, 1 constructors.
4: Returning a local used 0 copies, 1 constructors.
5: Returning through a delegate used 0 copies, 1 constructors
6: Returning a local vector used 1 copies, 1 constructors.
Thanks, I’ve updated the post now!