What I Have Against Precompiled Headers

Precompiled headers is a technique to reduce compilation time when your #includes take a long time. Basically, you stuff all your #includes that rarely change into one single file, which you then include in all your other files. You then tell the compiler to precompile this header, so it takes a negligible amount of time to include later on.

Instead of doing:

//my_file.cpp
#include "my_file.h"
#include "3rdparty/fancy_stuff.h"
#include "boost/expensive_lib.h"
#include <string>

you do

//my_file.cpp
#include "stdafx.h" //Common name for precompiled headers in Visual Studio
#include "my_file.h"

//stdafx.h
#include "boost/expensive_lib.h"
#include "3rdparty/fancy_stuff.h"
#include <string>

Say compiling "boost/expensive_lib.h" and "3rdparty/fancy_stuff.h" takes three seconds, and you have twenty files including these, the files themselves beeing trivial to compile, taking half a second each. That means compilation will take 20*(3s + 0.5s) = 70 seconds. With precompiled headers, this takes you 20*0.5s + 3s = 13 seconds. And the next time you compile, stdafx.h is already compiled, so it only takes you 10. That’s roughly an order of magnitude quicker.

A great improvement, no?

While it improves compilation time, it certainly does not improve readability and maintainability. Personally, I think of it as a bit of an ugly hack.

There are at least three problems with this:

1: As your project evolves, stdafx.h can get quite crowded. Come refactoring-time, you want to remove #includes which are no longer in use. While this can be challenging enough in a large cpp-file, imagine having one file holding includes from tens of others. Now you don’t only have to look through the current file to see if a certain #include can be removed, you have to look through you entire project.

2: stdafx.h now has to be the first header you include, which violates best pratice for include order.

3: Precompiled headers make .cpp-files quicker to compile. But what about when a header itself depends on other expensive headers? Normally, these need to be included in the header itself, leading to long compilation time for everyone using this header. A popular technique with precompiled headers is to not include these in the header. Since all .cpp-files will have included stdafx.h before including this header, everything will be ok. In addition to the problems laid out in 2, this poses a bigger problem when this header-file is included in .cpp-files in other projects, or even other solutions. If someone now wants to include my my_header.h, it is suddenly their responsibility to include headers that my_header.h uses internally.

That about sums it up. Consider precompiled headers added to my list of “necessary evils in C++”.

If you enjoyed this post, you can subscribe to my blog, or follow me on Twitter.

On the Importance of Fitting in

Programming in a object oriented language can be seen as an exercise in extending the type system. And if all your code is wrapped nicely in classes and functions, what’s left is just combining those using the language. Simple, right?

Seen from this viewpoint, the importance of designing your types correctly become very important. And the best way to design them correctly, is to have them behave as much as possible as the built-in types and library types. (On a side note, this is one reason I dislike Java’s lack of operator overloading.)

As an example, say I am designing an embedded system for a car stereo. Every radio-station is stored in a RadioStation class. There is also a RadioStationContainer class that manages the radiostations. Now we need a function to add RadioStations to the container. What do we name it? What name will make a good interface for the user of this library? addRadioStation()?

I would say a much better name is push_back(). Even though you might think addRadioStation() sounds like a more intuitive name, if you are making a container, I’d argue having it behave like all other containers is more intuitive.

How about allowing people to iterate over radio stations? The iterator type will depend on the type of container RadioStationContainer is using internally. One method I’ve seen is people use something like this (oustide the RadioStationContainer class): typedef std::list<RadioStation> RSCit. This gives people a short an easy name for the iterator, right? Again I would argue you should instead make a normal typedef inside the class, so people can use the normal RadioStationContainer::iterator. If they need a shorthand, they can make their own typedef.

Here is an example of a RadioStationContainer that can be used as a normal container:

class RadioStationContainer {
public:
    //Define the normal iterator types the user will expect
    typedef list<RadioStation>::iterator iterator;
    typedef list<RadioStation>::const_iterator const_iterator;

    //Default constructor and copy constructor
    RadioStationContainer() {}
    RadioStationContainer(const RadioStationContainer& rc) {
        copy(rc.begin(), rc.end(), back_inserter(stations));
    }

    //push_back() defined with the normal container interface
    void push_back(const RadioStation& s) { stations.push_back(s); }

    //iterators for working with both const and non const RadioStationContainers
    iterator begin() { return stations.begin(); }
    iterator end() { return stations.end(); }
    const_iterator begin() const { return stations.begin(); }
    const_iterator end() const { return stations.end(); }

private:
    list<RadioStation> stations;

};

This will fit nicely with how a user of the library expects a container to behave. But there is more! This will also fit very nicely with how the Standard Template Library expects a container to behave! You have already seen an example, using copy and back_inserter in the copy constructor. But now the user is also free to use transform, for_each etc:

void doStuffWithStation(RadioStation& s);

void f(RadioStationContainer& rc) {
    for_each(rc.begin(), rc.end(), doStuffWithStation);
}

So when in doubt, always try to fit in.

Minimize the Scope of Each Variable

Whenever you declare a variable, please make sure to make its scope as small as possible. Yesterday I was refactoring a pretty large function, with lots of loops in it. It looked something like this:

void f() {
  int index;
  double delta;
  //(...)tens of more variable definitions
  for (index = 0; index < _v1.size(); ++index) {
    delta = getDelta();
    //Computations, using delta and other variables
  }
  for (index = 0; index < _v2.size(); ++ index) {
    delta = getDelta();
    //Computations, using delta and other variables
  }
  //(...) hundreds of similar lines
}

This is of course a simplified example, the real function was four hundred lines long, and full of array-indexing and computations. (It had been ported more or less verbatim from Fortran, which explains the clustering of declarations at the top.)

The problem when making a change to such a function, is that the scope you need to understand is unnecessarily large. You never know if the value of index or delta is used in some clever way further down in the function, so if are changing it in one place, you basically have to grok the entire function.

On a side note, this is also a bad idea:

void f() {
  double delta;
  for (...) {
    delta = getVal();
    //Computations, using delta and other variables
  }

Even though it looks like you have cleverly optimized the definition of delta outside of the for loop, the only thing you have done is increase its scope and decrease readability. The compiler is perfectly capable of doing such optimizations for you.

An Interface is More than Names and Arguments

If you claim to conform to an interface, it is not enough to follow the syntax, you must also be careful about the semantics.

I saw an example of this the other day, when I came across a custom vector class (different from std::vector) used for numerics. It provided much of the same interface provided by std::vector, among which, resize().

It looked something like this:

template <class T>
class Vec {
  (...)
  //! STL vector interface
  void resize(int size, T value);
}

That was however all the documentation that was available. Since it claimed to confirm to the stl::vector interface, I naturally assumed it meant the same thing.

Here is the documentation for std::vector::resize(size_type sz, T c):

If sz is smaller than the current vector size, the content is reduced to its first sz elements, the rest being dropped.

If sz is greater than the current vector size, the content is expanded by inserting at the end as many copies of c as needed to reach a size of sz elements.

Just to be sure, I had a look at the implementation of Vec::resize(int size, T value), and what do you know:

template <class T>
  //What was going on, conceptually:
  void Vec::resize(int size, T value) {
    _v.resize(0);
    _v.assign(size, value);
  }

See the difference? Vec::resize() drops and initializes all elements, whereas vector::resize() only initializes any extra elements. This might be dangerous if you for instance port from Vec to vector and rely on all elements to be reinitialized by resize(), or if you port from vector to Vec and rely on resize() to keep your old values.

So when claiming to confirm to an interface, make sure to not only adhere to the signature, but also to the semantics.

Allow the Compiler To Do Copies for You

Whenever possible, allow the compiler to do your copies for you. After all, it may know something you don’t:

bool search_for_token(std::string& str); //Tokens must be uppercase

bool check(const std::string& str) {
  string upper(str);
  makeUppercaseInplace(upper);
  return search_for_token(upper);
}

Even though the general best practice for passing non-changing objects is by reference to const, if you need to make a copy anyway, let the compiler do the job:

bool check(std::string str) {
  makeUppercaseInplace(str);
  return do_check(str);
}

What is the difference? It looks like when check() is called, a copy is made no matter if you or the compiler does it. The difference is, when you allow the compiler to do the job, you also alow it to skip the copy if it is not strictly needed. If the compiler detects that the string you are calling check with can never be used again, it can skip the copy and give that string directly to makeUpperCaseInplace(). Example:

std::string generateString();
check(generateString());

In this situation, the compiler knows that the return value from generateString() can never be used by anyone else^[1], and can reuse that object when calling check().

Compiling Boost with STLPort on Windows

I couldn’t find a concise howto on compiling Boost with STLPort on Windows, so here it is:

Download Boost. I used version 1.44, I have heard people have struggled with earlier versions.
In the Boost root directory, open project-config.jam. Add the following line:
using stlport : STLPORT_VERSION : C:/PATH_TO_STLPORT/stlport : C:/PATH_TO_STLPORT/lib ;
, substituting STLPORT_VERSION for you version of STLPort, and PATH_TO_STLPORT to you STLPort directory.
Build Boost: bjam stdlib=stlport stage. If you want to build a static lib, add link=static runtime-link=static.

My setup:

Boost 1.44
STLPort 5.2.1
Visual Studio 2008

If you enjoyed this post, you can subscribe to my blog, or follow me on Twitter

Show Me Your Signature, and I’ll Tell You Who You Are

When you write your function signatures, you have a choice between passing values, pointers or references. You might be able to make any of them work for the compiler, but what do they tell the user?

Note that even though pointers and reference are somewhat related, and mostly communicate the same thing, they have different suggestions about ownership.

Parameters

1: Pass by value

void foo(Bar b); I need to copy your object, because I need to modify it, and you don’t want to see the change. (Except for built-in types, which are usually passed by value even though they are not modified by the function.)

2: Pass by reference/pointer

void foo(Bar& b); void foo(Bar* b); I need a reference to your object, in order to modify it, beacuse you need to see the change.

3: Pass by reference to const

void foo(const Bar& b); void foo(const Bar * b); I won’t need to modify your object, and I don’t want to pay the price of a copy.

You could argue that 1 just means I will leave your object alone, and doesn’t say anything about modification. But if you aren’t going to modify it, you should use 3, so I think 1 explicitly states that the object is going to be modified (invisibly to the caller). Unless of course the argument is a built-in type.

Return values

1: Return by value

Bar foo(); Here, take this object and do whatever you want with it, I won’t touch it again. It’s yours.

2: Return a reference

Bar& foo(); There is an object over here that you can use. Someone else might silently change it, though. By the way, I own it, and it is my responsibility to delete it. (If foo() is a non-static member function, you can usually assume that won’t happen until the object on which it is called goes out of scope. If foo() is a static function, you can usually assume this won’t happen until the program exits.)

3: Return a pointer

Bar* foo(); There is an object over here that you can use. Someone else might silently change it, though. By the way, I might delete it at any time. Or maybe that’s your responsibility, and if you don’t, you’ll have a memory leak. But I won’t tell you which one it is! You need more information to be sure, for instance the documentation. Often, you can also deduce ownership from the situation. A singleton retains ownership, a factory does not.

4: Return a reference/pointer to const

const Bar& foo(); const Bar* foo(); There is an object over here that you can use, and I promise it won’t change, even if I still own it. Rules of deletion are as in 2. The reason I don’t list the ownership issues for the pointer in this case, is that I think const is an indication that ownership is retained. If it was not, why use const at all?

A Summary of “const”, Part One

Over the last two weeks, I have mentioned const a couple of times.

const is an often used keyword in C++ (though I would like to see people use it even more), and the different uses can be confusing. In this post I will try to summarize the most common uses:

Constant variables

Constant variables are simple, they cannot be changed after initialization:

const int answer = 43; //Cannot be changed  (even though I know you want to)
answer = 42; //Nope, not today.

Constant pointers

Constant pointers are a bit more involved, as the pointer can either be constant itself, point to something constant, or both, or none:

int x = 1;
const int c = 43;

const int * p1 = &c; //Non-const pointer to something const
p1 = &x; //So we can change what it points to
         //(Also note that a pointer to const can point to something non-const).

int * const p2 = &x; //Const pointer to something non-const.
p2 = c; //Not allowed, cannot change what it points to
*p2 = 2; //But can change the thing it points to

int * p3 = &c; //Cannot point to something const with a pointer to non-const
int * const p4 = &c; //Not even with a const pointer

const int * const p5 = &x; //Const ponter to something const
p5 = &c; //Cannot change what it points to
*p5 = 2; //Cannot change the thing it points to either

//And just to confuse things, it doesn't matter on which side of the type you put const, so the following are both a non-const pointer to a const int:

const int * d = &c;
int const * e = &c;
*d = 42; //Not allowed
*e = 42; //Not allowed

Const and functions

const can also be used with functions:

//Passing arguments as reference to const is a best practice, since this allows
//you to avoid copying, and still promise the caller that his object won't
//be modified.
void f(const string& s);

But there is one more use with const and functions, that is const member functions. Here, const applies to the method itself, not the parameters. A const member function promises to not modify the object on which it is called:

struct Foo {
    int getFoo() const; //Will not modify the object on which it is called
    int getMore(); //Might change the object on which it is called
};

int doFoo(const Foo& foo) {
    foo.getFoo(); //Ok
    foo.getMore(); //Not ok, cannot call non-const methods on const objects
}

Functions can also return const variables and pointers, but I’ll cover that in a follow-up where I’ll also cover operators. (Operators are in essence functions, but there is more to say about them.)

Changing the Unchangeable

Last week I asked you to please make member functions const whenever possible. But constant doesn’t always mean constant.

Declaring an object const is a way to tell the compiler that this object cannot be changed whatsoever. What you actually try to express might however be a logical immutability, that the meaning of the object should not change. That doesn’t necessarily imply that none of its member variables can be changed.

The most common example is to cache the result of an expensive operation. To expand on last weeks example:

class Whisky {
public:
  Smell smellIt() const;
  Taste tasteIt(int ml);
};

Imagine that computing the smell of the whisky is a complicated operation [1], that you don’t want to do every time someone smells it. You might want to introduce a private member to hold a cached description of the smell. This member must be possible to change, to be able to write the cached value. Computing and storing this value does however not logically modify the object, so it should be possible to do even for const objects. To allow for this, use the mutable keyword:

class Whisky {
public:
  Smell smellIt() const;
  Taste tasteIt(int ml);
private:
  mutable Smell* smell;
};

The definition of smellIt() would now look something like this:

Smell Whisky::smellIt() const {
  if (!smell)
    smell = computeSmell(); //Assume this function exists and returns a new, dynamically allocated Smell object
  return *smell;
}

Exactly how you do the caching is up to you, but if you use a simple pointer (and not for instance a shared_ptr), you must remember to delete smell in ~Whisky().

Please Make Member Functions Const Whenever Possible

When you write a member function that doesn’t modify the object it operates on, please make sure to make it const.

class Whisky {
public:
  Smell smellIt() const;
  Tase tasteIt(int ml);
};

Smelling the whisky doesn’t modify it, so I made smellIt() const. Tasting does however modify it (there is less left in the glass), so tasteIt() is not const.

Why does this matter? If someone makes a const object of the class, they might still want to call some methods on it. For instance, a good practice for a method is to take objects by reference to const. And then you will only be able to call const methods:

void describeWhisky(const Whisky& whisky) {
  std::cout << whisky.smellIt();
}

This would not be possible if smellIt() was not const. The same goes for using const_iterators and a lot of other situations, so please remember to add const whenever you can, even though it doesn’t make a differece to you then and there.

Edit: I posted a follow up, on how you can change some parts of an object, even if it is const.

	Anonymous on A prvalue is not a tempor…
	A prvalue is not a t… on A prvalue is not a tempor…
	A prvalue is not a t… on lvalues, rvalues, glvalues, pr…
	A prvalue is not a t… on A prvalue is not a tempor…
	A prvalue is not a t… on lvalues, rvalues, glvalues, pr…