New Software Community, and Google Test Demo


Together with Håkon K. Olafsen, I founded Kjeller Software Community earlier this summer. Our first meetup is due this Wednesday (September 26). I will do a demo of Google C++ Testing Framework, aka. GoogleTest, and afterwards we will have a few beers and chat about programming and all things geeky. Hopefully we will also get some good suggestions for future events and meetups. The meetup will happen at Klimt pub in Lillestrøm, at 6:00 pm.

This meetup is done in cooperation with Oslo C++ Users Group, of which I am also a member. If you are into C++, and live in the greater Oslo area, that group is also highly recommended.

Notes from the talk, and a blog post about Google Test will be up later this week.

If you enjoyed this post, you can subscribe to my blog, or follow me on Twitter. You can also follow Kjeller Software Community on Twitter.

Private Inheritance


In which I introduce private inheritance, but discourage its use.

When inheriting in C++, you normally see

class Derived : public Base {};

It’s almost as if public is synonymous to inherits from. But did you know there is also private inheritance, and why you (probably) don’t see it a lot?

When inheriting publicly from a base class, all base members will be accessible from the derived class, with the same accessibility as in the base class. Given these classes:

class Base
{
public:
    void pub() {}
private:
    void priv() {}
};

class DerivedPublic : public Base
{
};

class DerivedPrivate : private Base
{
};

Public inheritance results in this:

    DerivedPublic derivedPublic;
    derivedPublic.pub();
    //derivedPublic.priv(); //error: ‘void Base::priv()’ is private

Whereas private inheritance results in this:

    DerivedPrivate derivedPrivate;
    //derivedPrivate.pub(); //error: ‘void Base::pub()’ is inaccessible
    //derivedPrivate.priv(); //error: ‘void Base::priv()’ is private

So why would you want to inherit privately? To allow Derived to access the public members of Base, without exposing them to the users of Derived.

Inside the class, the members are accessible:

class DerivedPrivate2: private Base
{
public:
    void foo() { pub(); }
};

But outside, they are not:

    DerivedPrivate2 derivedPrivate2;
    derivedPrivate2.foo();
    //derivedPrivate2.pub(); //error: ‘void Base::pub()’ is inaccessible

But wait a minute, doesn’t this look a whole lot like the good old inheritance (is-a) vs. composition (has-a)? It does indeed! Private inheritance is really a has-a. And in most circumstances composition and private inheritance are interchangeable. However, since inheritance results in stronger coupling, the general recommendation is to choose composition instead. Here is how DerivedPrivate2 would look using composition:

class NotDerived
{
public:
    void foo() { b.pub(); }
private:
    Base b;
};

Now you may be thinking: “But I read somewhere that you need to use private inheritance if you want to override a virtual Base method?” You probably did, and I’ll get back to that in the next post.

As usual, the code for this blog post is available on GitHub.

If you enjoyed this post, you can subscribe to my blog, or follow me on Twitter.

Basic gloox Tutorial


There didn’t seem to be a basic gloox tutorial available, and the example on their home page is out of date, so I decided to write one. This tutorial is based on their example, but extended and updated to work with gloox version 1.0.

Since you are reading this, you probably already know, but gloox is a popular library for the Extensible Messaging and Presence Protocol (XMPP), formerly known as Jabber. In this basic tutorial I will demonstrate how to create an XMPP bot that connects to a server (using TLS), listens for messages, and answers them in an annoying way. If you just want the code, get bot.cpp from GitHub, compilation instructions are at the top of the file.

The first thing we need to do is set up a MessageHandler. This will connect to the server, and handle incoming messages.

class Bot : public MessageHandler {
public:
    Bot() {
        JID jid("bot@localhost");
        client = new Client( jid, "botpwd" );
        connListener = new ConnListener();
        client->registerMessageHandler( this );
        client->registerConnectionListener(connListener);
        client->connect(true);
    }

4: The ID of our user. In my example I use a local XMPP server, with a previously created user “bot”. (See the appendix for instructions on how to install an XMPP server.)
5: Here we create the client, with our ID and password.
6: We need a ConnectionListener to handle connections, I will get back to that later.
7: The Client needs a MessageHandler to handle incoming messages. Since the Bot itself inherits from MessageHandler, we just pass this.
8: The ConnectionListener must also be registered with the Client.
9: Finally we connect to the XMPP server. There are two ways to connect, blocking and non-blocking. In our example, we use blocking connections. If you go for non-blocking, you need to call Client::recv() at regular intervals to receive data.

To handle messages, Bot needs to implement a method handleMessage:

    virtual void handleMessage( const Message& stanza, MessageSession* session = 0 ) {
        cout << "Received message: " << stanza << endl;
        Message msg(stanza.subtype(), stanza.from(), "Tell me more about " + stanza.body() );
        client->send( msg );
    }

2: Print the received message (using a custom operator<<).
3: Create a new message to whoever sent us a message, asking them to tell us more about this fascinating subject.
4: Send the message, using the Client we created in the Bot constructor.

We also need a ConnectionListener to handle connections. As a minimum it needs to override three pure virtual methods:

class ConnListener : public ConnectionListener {
public:
    virtual void onConnect() {
        cout << "ConnListener::onConnect()" << endl;
    }
    virtual void onDisconnect(ConnectionError e) {
        cout << "ConnListener::onDisconnect() " << e << endl;
    }
    virtual bool onTLSConnect(const CertInfo& info) {
        cout << "ConnListener::onTLSConnect()" << endl;
        return true;
    }
};

Here I just cout the names of the methods when they are called, to see that everything works. If there is an error, onDisconnect() displays the error code. The error codes can be found in gloox.h.
The main method to pay attention to is onTLSConnect(). This should check the TLS cert credentials, and return true if they are accepted. In this tutorial we accept whatever we get.

Then you can run main like this:

int main() {
    Bot b;
}

You can now chat with the bot using any XMPP client:
A Pidgin client talking to the bot
(See the appendix for instructions on installing and setting up an XMPP client.)

That’s it! Get the full code from GitHub.

To compile, you need to install gloox and pthreads. On Ubuntu, gloox is installed doing sudo apt-get install libgloox-dev, pthreads should come with your compiler. To compile using g++, do g++ -o bot bot.cpp -lgloox -lpthread. Other Linuxes should be similar. On Windows, download and install from http://camaya.net/gloox/download, and read the included documentation.

If you enjoyed this post, you can subscribe to my blog, or follow me on Twitter.

Appendix A: How to install a local XMPP server
To test this, I installed ejabbered. If you are using Ubuntu, there is a simple, good tutorial available. As soon as ejabberd is installed, visit http://<servername&gt;:5280/admin/server/localhost/users/ and add the user “bot”, with password “botpwd”. Also add another user to play the other part of the conversation (I used “anders”).

Appendix B: How to install and test with an XMPP client
There are many XMPP clients available, I used Pidgin (sudo apt-get install pidgin on Ubuntu). After installing, open the main window (“Buddy List”) and select Accounts -> Manage Accounts -> Add. Select protocol “XMPP”, the other username you created on your XMPP server (not “bot”) and a domain (“localhost” if you are running on localhost). On the Advanced page, select port “5222”, server “localhost” (or whatever host you are running on). Click “Add”, and tick the “Enabled”-box in the list of accounts. Hit “Close”. Now go to the Buddy List again, select “Buddies”, “Add Buddy”. In “Buddy’s username”, enter “bot@localhost” (or whatever host you are running on).

If you have started your bot (./bot on the command line), you can now double-click on the bot in you Buddy List as you would normally do to chat with someone in any IM program, and type your message. If everything went according to plan, the bot will be very interested, and want to know more!

Disempower Every Variable


In which I argue you should reduce the circle of influence, and the ability to change, of every variable.

The more a variable can do, the harder it is to reason about. If you want to change a single line of code involving the variable, you need to understand all its other uses. To make your code more readable and maintainable, you should disempower all your variables as much as possible.

Here are two things you can do to minimize the power of a variable:

1: Reduce its circle of influence (minimize the scope)
I once had to make a bugfix in a 400 line function, containing tens of for-loops. They all reused a single counter variable:

{
  int i;
  (...)
  for (i = 0; i < n; ++i) {
  }
  (...)
  for (i = 0; i < n; ++i) {
  }
  //350 lines later...
  for (i = 0; i < n; ++i) {
  }
}

When looking at a single for-loop, how am I to know that the value of i is not used after the specific loop I was working on? Someone might be doing something like

for (i = 0; i < n; ++i) {
}
some_array[i] = 23

or

for (i = 0; i < n; ++i) {
}
for (; i < m; ++i) {
}

The solution here is of course to use a local variable to each for-loop (unless of course it actually is used outside of the loop):

for (int i = 0; i < n; ++i) {
}
for (int i = 0; i < n; ++i) {
}

Now I can be sure that if I change i in one for-loop, it won’t affect the rest of the function.

2: Take away its ability to change (make it const)

(I have blogged about const a few times before. It is almost always a good idea to make everything that doesn’t need to change const.)

Making a local variable const helps the reader to reason about the variable, since he will instantly know that its value will never change:

void foo() {
  const string key = getCurrentKey();
  (...) //Later...
  doSomethingWith(key);
  (...) //Even later...
  collection.getItem(key).process();

Here the reader knows that we are always working with the same key throughout foo().

In summary: Reduce the circle of influence (by reducing the scope) and take away the ability to change (by using const).

If you enjoyed this post, you can subscribe to my blog, or follow me on Twitter.

Undefined Behaviour — Worse Than its Reputation?


Last week I wrote about The Difference Between Unspecified and Undefined Behaviour. This week I’d like to expand a bit more on the severity of undefined behaviour. If however you have a lot of time, instead go read A Guide to Undefined Behavior in C and C++ by John Regehr of the University of Utah, and then What Every C Programmer Should Know About Undefined Behavior by Chris Lattner of the LLVM project, as they cover this material in much more depth (and a lot more words!) than I do here.

To expand on the example from last week, what is the output of this program?

int main()
{
    int array[] = {1,2,3};
    cout << array[3] << endl;
    cout << "Goodbye, cruel world!" << endl;
}

A good guess would be a random integer on one line, then “Goodbye, cruel world!” on another line. A better guess would be that anything can happen on the first line, but then “Goodbye, cruel world!” for sure is printed. The answer is however that we can’t even know that, since If any step in a program’s execution has undefined behavior, then the entire execution is without meaning. [Regehr p.1].

This fact has two implications that I want to emphasize:

1: An optimizing compiler can move the undefined operation to a different place than it is given in the source code
[Regehr p.3] gives a good example of this:

int a;

void foo (unsigned y, unsigned z)
{
  bar();
  a = y%z; //Possible divide by zero
}

What happens if we call foo(1,0)? You would think bar() gets called, and then the program crashes. The compiler is however allowed to reorder the two lines in foo(), and [Regehr p.3] indeed shows that Clang does exactly this.

What are the implications? If you are investigating a crash in your program and never see the results of bar(), you might falsely conclude that the bug in the sourcecode must be before bar() is called, or in its very beginning. To find the real bug in this case you would have to turn off optimization, or step through the program in a debugger.

2: Seemingly unrelated code can be optimized away near a possible undefined behaviour
[Lattner p.1] presents a good example:

void contains_null_check(int *P) {
  int dead = *P;
  if (P == 0)
    return;
  *P = 4;
}

What happens if P is NULL? Maybe some garbage gets stored in int dead? Maybe dereferencing P crashes the program? At least we can be sure that we will never reach the last line, *P = 4 because of the check if (P == 0). Or can we?

An optimizing compiler applies its optimizations in series, not in one omniscient operation. Imagine two optimizations acting on this code, “Redundant Null Check Elimination” and “Dead Code Elimination” (in that order).

During Redundant Null Check Elimination, the compiler figures that if P == NULL, then int dead = *P; results in undefined behaviour, and the entire execution is undefined. The compiler can basically do whatever it wants. If P != NULL however, there is no need for the if-check. So it safley optimizes it away:

void contains_null_check(int *P) {
  int dead = *P;
  //if (P == 0)
    //return;
  *P = 4;
}

During Dead Code Elimination, the compiler figures out that dead is never used, and optimizes that line away as well. This invalidates the assumption made by Redundant Null Check Elimination, but the compiler has no way of knowing this, and we end up with this:

void contains_null_check(int *P) {
  *P = 4;
}

When we wrote this piece of code, we were sure (or so we thought) that *P = 4 would never be reached when P == NULL, but the compiler (correctly) optimized away the guard we meticulously had put in place.

Concluding notes
If you thought undefined behaviour only affected the operation in which it appears, I hope I have convinced you otherwise. And if you found the topic interesting, I really recommend reading the two articles I mentioned in the beginning (A Guide to Undefined Behavior in C and C++ and What Every C Programmer Should Know About Undefined Behavior). And the morale of the story is of course to avoid undefined behaviour like the plague.

If you enjoyed this post, you can subscribe to my blog, or follow me on Twitter.

The Difference Between Unspecified and Undefined Behaviour


What is the output of this program?

int main()
{
    int array[] = {1,2,3};
    cout << array[3] << endl;
}

Answer: Noone knows!

What is the output of this program?

void f(int i, int j){}

int foo()
{
    cout << "foo ";
    return 42;
}

int bar()
{
    cout << "bar ";
    return 42;
}

int main()
{
    f(foo(), bar());
}

Answer: Noone knows!

There is a difference in the severity of uncertainty though. The first case results in undefined behaviour (because we are indexing outside of the array), whereas the second results in unspecified behaviour (because we don’t know the order in which the function arguments will be evaluated). What is the difference?

In the case of undefined behaviour, we are screwed. Anything can happen, from what you thought should happen, to the program sending threatening letters to your neighbour’s cat. Probably it will read the memory right after where the array is stored, interpret whatever garbage is there and print it, but there is no way to know this.

In the case of unspecified behaviour however, we are probably OK. The implementation is allowed to choose from a set of well-defined behaviours. In our case, there are two possibilities, calling foo() then bar(), or bar() then foo(). Note that if foo() and bar() have some side-effects that we rely on being executed in a specific order, this unspecified behaviour would still mean we have a bug in our code.

To summarize, never write code that results in undefined behaviour, and never write code that relies on unspecified behaviour.

If you enjoyed this post, you can subscribe to my blog, or follow me on Twitter.

I am Taking a Hiatus


I am very busy building Something™, and sadly have to downprioritize blogging for a few months. Don’t delete me from your feed reader though, there might be some sporadic posts, and I will be back in 2011! Also, I’ll keep you posted when Something™ is released.

(If you want to be notified when I resume normal operations, please add https://blog.knatten.org/feed/ to your feed reader, post a comment on this post, or send me an email at anders at knatten dot org.)