Non-virtual destructors

I got a user submitted question to CppQuiz.org the other day where the contributor had gotten the answer wrong, and I didn’t notice at first. The question was about non-virtual destructors:

#include <iostream>

struct B {
  B() {
    std::cout << 'b';
  }
  ~B() {
    std::cout << 'B';
  }
};

struct D : B {
  D() {
    std::cout << 'd';
  }
  ~D() {
    std::cout << 'D';
  }
};

int main() {
  B* p = new D;
  delete p;
}

As usual on CppQuiz, the objective is to figure out the output of the program, according to the C++ standard. Before continuing, please try to find the answer yourself!

What does the standard say?

Did you reply bdB? So did the person who contributed the question, and I initially went “yep, that seems logical”. It does indeed seem logical that these functions are called, but what exactly happens when we fail to destroy the derived part of an object? Let’s have a look at §5.3.5/3 in the C++11 standard:

if the static type of the object to be deleted is different from its dynamic type, the static type shall be a base class of the dynamic type of the object to be deleted and the static type shall have a virtual destructor or the behavior is undefined.

The static type here is B, which is indeed a base class of the dynamic type D. The static type B does however not have a virtual destructor, so the behavior is undefined. (I have yet to see a compiler who doesn’t print bdB though.)

So why is this undefined behavior? It would be hard for the standard to specify exactly what happens when an object is not properly destroyed.

In practice, most compilers will probably go with the normal rules for virtual functions, print bdB, and simply not destroy the D part of the object. But since the behavior is undefined, there’s no guarantee for this.

One could argue that even if the result of not destroying D properly is undefined, the standard could still mandate which destructor(s) actually get called. However, when undefined behavior occurs in a program, the entire execution is undefined, so there really would be no point.

If you’re curious about undefined behavior, I’ve written about it before. That article also has links to some really interesting, more in-depth articles about the subject.

As usual, the code for this blog post is available on GitHub.

If you enjoyed this post, you can subscribe to my blog, or follow me on Twitter.

How to contribute questions for cppquiz.org

You might be familiar with my C++ quiz site, cppquiz.org. But did you know you can contribute your own questions to the site? First, try solving a few questions to get the gist of it, then visit http://cppquiz.org/quiz/create to create your own.

I try to live by the “quality over quantity” motto for the questions on the site, so you’ll find the following guidelines on the submission form:

Your question should be short, and demonstrate one thing only
Your question should compile (unless not compiling is the point)
Your question should be about standard C++, not compiler specific or about 3rd party libraries
Your question should not be a trick question, and be free from distractions
Your explanation should be clear and to the point
Your explanation should use correct terminology, and refer to the standard where possible
Prefer well defined programs over programs with compilation errors, undefined or unspecified behaviour

But what does all of that mean?

Your question should be short, and demonstrate one thing only

The shorter the code in the question, the better. The point of a question is to teach one single aspect of C++, not to be an exercise in reading and understanding an unfamiliar code base.

Your question should compile (unless not compiling is the point)

It should be possible to copy your code verbatim and have it compile as C++11, without errors, requirements for additional includes etc. (Of course, some questions are not intended to compile, then this rule does not apply.)

Your question should be about standard C++, not compiler specific or about 3rd party libraries

This should be fairly self explanatory, don’t make questions about posix, boost, windows.h etc.

Your question should not be a trick question, and be free from distractions

I’ve refused several questions due to this. Your question is supposed to be non-trivial, but the difficulty should be to understand a concept of C++, not to read the question correctly, or find the clue in the midst of distractions. Don’t make questions where the key is noticing that the variables v1 and vl are not the same. Don’t make questions full of complicated C++ that doesn’t matter, but is just there to hide the simple core of the question.

Your explanation should be clear and the point

Don’t go on lengthy asides in the explanation. As your question is already short and demonstrates one thing only, explaining that single concept should not take too many words.

Your explanation should use correct terminology, and refer to the standard where possible

The chances of having your question published quickly increases if you use correct C++ terminology to describe the concepts in your question. It helps me a lot if you’re also able to provide references to the standard, but don’t let this stop you from contributing.

Prefer well defined programs over programs with compilation errors, undefined or unspecified behaviour

There’s a lot of fun and interesting things to learn from questions that don’t compile, or contain undefined or unspecified behaviour. It’s however not so fun when most of the questions on the site can be answered simply by enumerating those three alternatives as an answer.

That’s it! I hope I didn’t scare you away from contributing. Please have a go, and don’t be afraid to ask if you have any questions about your question! :)

Another Reason to Avoid #includes in Headers

I have already argued that you shouldn’t put all your includes in your .h files. Here is one more reason, compilation time.

Have a look at this example, where the arrows mean “includes”

file5.h apparently need access to something which is defined in file4.h, which again needs access to three other headers. Because of this, all other files that include file5.h also includes file[1-4].h.

In c++, #include "file.h" really means “copy the entire contents of file.h here before compiling”. So in this example, file[1-3] is copied into file4.h, which is then copied into file5.h, which again is copied into the three cpp files. Every file takes a bit of time to compile, and now each cpp file doesn’t only need to compile its own content, but also all of the (directly and indirectly) included headers. What happens if we add some compilation times to our diagram? The compilation time of the file itself is on the left, the compilation time including the included headers are on the right.

As we can see, this has a dramatic effect on compilation time. file6.cpp and file7.cpp just needed something from file5.h, but got an entire tree of headers which added 1.2 seconds to their compilation times. This might not sound much, but those numbers add up when the number of files is large. Also, if you’re trying to do TDD, every second counts. And in some cases, compilation times of individual headers can be a lot worse than in this example.

What if file5 didn’t really need to have #include "file4.h" in the header, but could move it to the source file instead? Then we would have this diagram:

The compilation time of file[6-7].cpp is significantly reduced.

Now let’s look at what happens if a header file is modified. Let’s say you need to make a minor update in file1.h. When that file is changed, all the files that include it need to be recompiled as well:

But if we were able to #include "file4.h" in file5.cpp instead of file5.h, only one cpp file would need to recompile:

Recompilation time: 1.7 seconds vs. 4.5 seconds. And in a large project, this would be a lot worse. So please try to move your #includes down into the cpp files whenever you can!

The Graphviz code and makefile for this blog post is available on GitHub.

If you enjoyed this post, you can subscribe to my blog, or follow me on Twitter.

New Software Community, and Google Test Demo

Together with Håkon K. Olafsen, I founded Kjeller Software Community earlier this summer. Our first meetup is due this Wednesday (September 26). I will do a demo of Google C++ Testing Framework, aka. GoogleTest, and afterwards we will have a few beers and chat about programming and all things geeky. Hopefully we will also get some good suggestions for future events and meetups. The meetup will happen at Klimt pub in Lillestrøm, at 6:00 pm.

This meetup is done in cooperation with Oslo C++ Users Group, of which I am also a member. If you are into C++, and live in the greater Oslo area, that group is also highly recommended.

Notes from the talk, and a blog post about Google Test will be up later this week.

If you enjoyed this post, you can subscribe to my blog, or follow me on Twitter. You can also follow Kjeller Software Community on Twitter.

Private Inheritance

In which I introduce private inheritance, but discourage its use.

When inheriting in C++, you normally see

class Derived : public Base {};

It’s almost as if public is synonymous to inherits from. But did you know there is also private inheritance, and why you (probably) don’t see it a lot?

When inheriting publicly from a base class, all base members will be accessible from the derived class, with the same accessibility as in the base class. Given these classes:

class Base
{
public:
    void pub() {}
private:
    void priv() {}
};

class DerivedPublic : public Base
{
};

class DerivedPrivate : private Base
{
};

Public inheritance results in this:

    DerivedPublic derivedPublic;
    derivedPublic.pub();
    //derivedPublic.priv(); //error: ‘void Base::priv()’ is private

Whereas private inheritance results in this:

    DerivedPrivate derivedPrivate;
    //derivedPrivate.pub(); //error: ‘void Base::pub()’ is inaccessible
    //derivedPrivate.priv(); //error: ‘void Base::priv()’ is private

So why would you want to inherit privately? To allow Derived to access the public members of Base, without exposing them to the users of Derived.

Inside the class, the members are accessible:

class DerivedPrivate2: private Base
{
public:
    void foo() { pub(); }
};

But outside, they are not:

    DerivedPrivate2 derivedPrivate2;
    derivedPrivate2.foo();
    //derivedPrivate2.pub(); //error: ‘void Base::pub()’ is inaccessible

But wait a minute, doesn’t this look a whole lot like the good old inheritance (is-a) vs. composition (has-a)? It does indeed! Private inheritance is really a has-a. And in most circumstances composition and private inheritance are interchangeable. However, since inheritance results in stronger coupling, the general recommendation is to choose composition instead. Here is how DerivedPrivate2 would look using composition:

class NotDerived
{
public:
    void foo() { b.pub(); }
private:
    Base b;
};

Now you may be thinking: “But I read somewhere that you need to use private inheritance if you want to override a virtual Base method?” You probably did, and I’ll get back to that in the next post.

As usual, the code for this blog post is available on GitHub.

If you enjoyed this post, you can subscribe to my blog, or follow me on Twitter.

Basic gloox Tutorial

There didn’t seem to be a basic gloox tutorial available, and the example on their home page is out of date, so I decided to write one. This tutorial is based on their example, but extended and updated to work with gloox version 1.0.

Since you are reading this, you probably already know, but gloox is a popular library for the Extensible Messaging and Presence Protocol (XMPP), formerly known as Jabber. In this basic tutorial I will demonstrate how to create an XMPP bot that connects to a server (using TLS), listens for messages, and answers them in an annoying way. If you just want the code, get bot.cpp from GitHub, compilation instructions are at the top of the file.

The first thing we need to do is set up a MessageHandler. This will connect to the server, and handle incoming messages.

class Bot : public MessageHandler {
public:
    Bot() {
        JID jid("bot@localhost");
        client = new Client( jid, "botpwd" );
        connListener = new ConnListener();
        client->registerMessageHandler( this );
        client->registerConnectionListener(connListener);
        client->connect(true);
    }

4: The ID of our user. In my example I use a local XMPP server, with a previously created user “bot”. (See the appendix for instructions on how to install an XMPP server.)
5: Here we create the client, with our ID and password.
6: We need a ConnectionListener to handle connections, I will get back to that later.
7: The Client needs a MessageHandler to handle incoming messages. Since the Bot itself inherits from MessageHandler, we just pass this.
8: The ConnectionListener must also be registered with the Client.
9: Finally we connect to the XMPP server. There are two ways to connect, blocking and non-blocking. In our example, we use blocking connections. If you go for non-blocking, you need to call Client::recv() at regular intervals to receive data.

To handle messages, Bot needs to implement a method handleMessage:

    virtual void handleMessage( const Message& stanza, MessageSession* session = 0 ) {
        cout << "Received message: " << stanza << endl;
        Message msg(stanza.subtype(), stanza.from(), "Tell me more about " + stanza.body() );
        client->send( msg );
    }

2: Print the received message (using a custom operator<<).
3: Create a new message to whoever sent us a message, asking them to tell us more about this fascinating subject.
4: Send the message, using the Client we created in the Bot constructor.

We also need a ConnectionListener to handle connections. As a minimum it needs to override three pure virtual methods:

class ConnListener : public ConnectionListener {
public:
    virtual void onConnect() {
        cout << "ConnListener::onConnect()" << endl;
    }
    virtual void onDisconnect(ConnectionError e) {
        cout << "ConnListener::onDisconnect() " << e << endl;
    }
    virtual bool onTLSConnect(const CertInfo& info) {
        cout << "ConnListener::onTLSConnect()" << endl;
        return true;
    }
};

Here I just cout the names of the methods when they are called, to see that everything works. If there is an error, onDisconnect() displays the error code. The error codes can be found in gloox.h.
The main method to pay attention to is onTLSConnect(). This should check the TLS cert credentials, and return true if they are accepted. In this tutorial we accept whatever we get.

Then you can run main like this:

int main() {
    Bot b;
}

You can now chat with the bot using any XMPP client:

(See the appendix for instructions on installing and setting up an XMPP client.)

That’s it! Get the full code from GitHub.

To compile, you need to install gloox and pthreads. On Ubuntu, gloox is installed doing sudo apt-get install libgloox-dev, pthreads should come with your compiler. To compile using g++, do g++ -o bot bot.cpp -lgloox -lpthread. Other Linuxes should be similar. On Windows, download and install from http://camaya.net/gloox/download, and read the included documentation.

If you enjoyed this post, you can subscribe to my blog, or follow me on Twitter.

Appendix A: How to install a local XMPP server
To test this, I installed ejabbered. If you are using Ubuntu, there is a simple, good tutorial available. As soon as ejabberd is installed, visit http://<servername>:5280/admin/server/localhost/users/ and add the user “bot”, with password “botpwd”. Also add another user to play the other part of the conversation (I used “anders”).

Appendix B: How to install and test with an XMPP client
There are many XMPP clients available, I used Pidgin (sudo apt-get install pidgin on Ubuntu). After installing, open the main window (“Buddy List”) and select Accounts -> Manage Accounts -> Add. Select protocol “XMPP”, the other username you created on your XMPP server (not “bot”) and a domain (“localhost” if you are running on localhost). On the Advanced page, select port “5222”, server “localhost” (or whatever host you are running on). Click “Add”, and tick the “Enabled”-box in the list of accounts. Hit “Close”. Now go to the Buddy List again, select “Buddies”, “Add Buddy”. In “Buddy’s username”, enter “bot@localhost” (or whatever host you are running on).

If you have started your bot (./bot on the command line), you can now double-click on the bot in you Buddy List as you would normally do to chat with someone in any IM program, and type your message. If everything went according to plan, the bot will be very interested, and want to know more!

Disempower Every Variable

In which I argue you should reduce the circle of influence, and the ability to change, of every variable.

The more a variable can do, the harder it is to reason about. If you want to change a single line of code involving the variable, you need to understand all its other uses. To make your code more readable and maintainable, you should disempower all your variables as much as possible.

Here are two things you can do to minimize the power of a variable:

1: Reduce its circle of influence (minimize the scope)
I once had to make a bugfix in a 400 line function, containing tens of for-loops. They all reused a single counter variable:

{
  int i;
  (...)
  for (i = 0; i < n; ++i) {
  }
  (...)
  for (i = 0; i < n; ++i) {
  }
  //350 lines later...
  for (i = 0; i < n; ++i) {
  }
}

When looking at a single for-loop, how am I to know that the value of i is not used after the specific loop I was working on? Someone might be doing something like

for (i = 0; i < n; ++i) {
}
some_array[i] = 23

for (i = 0; i < n; ++i) {
}
for (; i < m; ++i) {
}

The solution here is of course to use a local variable to each for-loop (unless of course it actually is used outside of the loop):

for (int i = 0; i < n; ++i) {
}
for (int i = 0; i < n; ++i) {
}

Now I can be sure that if I change i in one for-loop, it won’t affect the rest of the function.

2: Take away its ability to change (make it const)

(I have blogged about const a few times before. It is almost always a good idea to make everything that doesn’t need to change const.)

Making a local variable const helps the reader to reason about the variable, since he will instantly know that its value will never change:

void foo() {
  const string key = getCurrentKey();
  (...) //Later...
  doSomethingWith(key);
  (...) //Even later...
  collection.getItem(key).process();

Here the reader knows that we are always working with the same key throughout foo().

In summary: Reduce the circle of influence (by reducing the scope) and take away the ability to change (by using const).

If you enjoyed this post, you can subscribe to my blog, or follow me on Twitter.

Undefined Behaviour — Worse Than its Reputation?

Last week I wrote about The Difference Between Unspecified and Undefined Behaviour. This week I’d like to expand a bit more on the severity of undefined behaviour. If however you have a lot of time, instead go read A Guide to Undefined Behavior in C and C++ by John Regehr of the University of Utah, and then What Every C Programmer Should Know About Undefined Behavior by Chris Lattner of the LLVM project, as they cover this material in much more depth (and a lot more words!) than I do here.

To expand on the example from last week, what is the output of this program?

int main()
{
    int array[] = {1,2,3};
    cout << array[3] << endl;
    cout << "Goodbye, cruel world!" << endl;
}

A good guess would be a random integer on one line, then “Goodbye, cruel world!” on another line. A better guess would be that anything can happen on the first line, but then “Goodbye, cruel world!” for sure is printed. The answer is however that we can’t even know that, since If any step in a program’s execution has undefined behavior, then the entire execution is without meaning. [Regehr p.1].

This fact has two implications that I want to emphasize:

1: An optimizing compiler can move the undefined operation to a different place than it is given in the source code
[Regehr p.3] gives a good example of this:

int a;

void foo (unsigned y, unsigned z)
{
  bar();
  a = y%z; //Possible divide by zero
}

What happens if we call foo(1,0)? You would think bar() gets called, and then the program crashes. The compiler is however allowed to reorder the two lines in foo(), and [Regehr p.3] indeed shows that Clang does exactly this.

What are the implications? If you are investigating a crash in your program and never see the results of bar(), you might falsely conclude that the bug in the sourcecode must be before bar() is called, or in its very beginning. To find the real bug in this case you would have to turn off optimization, or step through the program in a debugger.

2: Seemingly unrelated code can be optimized away near a possible undefined behaviour
[Lattner p.1] presents a good example:

void contains_null_check(int *P) {
  int dead = *P;
  if (P == 0)
    return;
  *P = 4;
}

What happens if P is NULL? Maybe some garbage gets stored in int dead? Maybe dereferencing P crashes the program? At least we can be sure that we will never reach the last line, *P = 4 because of the check if (P == 0). Or can we?

An optimizing compiler applies its optimizations in series, not in one omniscient operation. Imagine two optimizations acting on this code, “Redundant Null Check Elimination” and “Dead Code Elimination” (in that order).

During Redundant Null Check Elimination, the compiler figures that if P == NULL, then int dead = *P; results in undefined behaviour, and the entire execution is undefined. The compiler can basically do whatever it wants. If P != NULL however, there is no need for the if-check. So it safley optimizes it away:

void contains_null_check(int *P) {
  int dead = *P;
  //if (P == 0)
    //return;
  *P = 4;
}

During Dead Code Elimination, the compiler figures out that dead is never used, and optimizes that line away as well. This invalidates the assumption made by Redundant Null Check Elimination, but the compiler has no way of knowing this, and we end up with this:

void contains_null_check(int *P) {
  *P = 4;
}

When we wrote this piece of code, we were sure (or so we thought) that *P = 4 would never be reached when P == NULL, but the compiler (correctly) optimized away the guard we meticulously had put in place.

Concluding notes
If you thought undefined behaviour only affected the operation in which it appears, I hope I have convinced you otherwise. And if you found the topic interesting, I really recommend reading the two articles I mentioned in the beginning (A Guide to Undefined Behavior in C and C++ and What Every C Programmer Should Know About Undefined Behavior). And the morale of the story is of course to avoid undefined behaviour like the plague.

If you enjoyed this post, you can subscribe to my blog, or follow me on Twitter.

The Difference Between Unspecified and Undefined Behaviour

What is the output of this program?

int main()
{
    int array[] = {1,2,3};
    cout << array[3] << endl;
}

Answer: Noone knows!

What is the output of this program?

void f(int i, int j){}

int foo()
{
    cout << "foo ";
    return 42;
}

int bar()
{
    cout << "bar ";
    return 42;
}

int main()
{
    f(foo(), bar());
}

Answer: Noone knows!

There is a difference in the severity of uncertainty though. The first case results in undefined behaviour (because we are indexing outside of the array), whereas the second results in unspecified behaviour (because we don’t know the order in which the function arguments will be evaluated). What is the difference?

In the case of undefined behaviour, we are screwed. Anything can happen, from what you thought should happen, to the program sending threatening letters to your neighbour’s cat. Probably it will read the memory right after where the array is stored, interpret whatever garbage is there and print it, but there is no way to know this.

In the case of unspecified behaviour however, we are probably OK. The implementation is allowed to choose from a set of well-defined behaviours. In our case, there are two possibilities, calling foo() then bar(), or bar() then foo(). Note that if foo() and bar() have some side-effects that we rely on being executed in a specific order, this unspecified behaviour would still mean we have a bug in our code.

To summarize, never write code that results in undefined behaviour, and never write code that relies on unspecified behaviour.

If you enjoyed this post, you can subscribe to my blog, or follow me on Twitter.

I am Taking a Hiatus

I am very busy building Something™, and sadly have to downprioritize blogging for a few months. Don’t delete me from your feed reader though, there might be some sporadic posts, and I will be back in 2011! Also, I’ll keep you posted when Something™ is released.

(If you want to be notified when I resume normal operations, please add https://blog.knatten.org/feed/ to your feed reader, post a comment on this post, or send me an email at anders at knatten dot org.)

	Anders Schau Knatten on Microsoft C++ versions ex…
	Anonymous on A prvalue is not a tempor…
	A prvalue is not a t… on A prvalue is not a tempor…
	A prvalue is not a t… on lvalues, rvalues, glvalues, pr…
	A prvalue is not a t… on A prvalue is not a tempor…