What I Have Against Precompiled Headers


Precompiled headers is a technique to reduce compilation time when your #includes take a long time. Basically, you stuff all your #includes that rarely change into one single file, which you then include in all your other files. You then tell the compiler to precompile this header, so it takes a negligible amount of time to include later on.

Instead of doing:

//my_file.cpp
#include "my_file.h"
#include "3rdparty/fancy_stuff.h"
#include "boost/expensive_lib.h"
#include <string>

you do

//my_file.cpp
#include "stdafx.h" //Common name for precompiled headers in Visual Studio
#include "my_file.h"

//stdafx.h
#include "boost/expensive_lib.h"
#include "3rdparty/fancy_stuff.h"
#include <string>

Say compiling "boost/expensive_lib.h" and "3rdparty/fancy_stuff.h" takes three seconds, and you have twenty files including these, the files themselves beeing trivial to compile, taking half a second each. That means compilation will take 20*(3s + 0.5s) = 70 seconds. With precompiled headers, this takes you 20*0.5s + 3s = 13 seconds. And the next time you compile, stdafx.h is already compiled, so it only takes you 10. That’s roughly an order of magnitude quicker.

A great improvement, no?

While it improves compilation time, it certainly does not improve readability and maintainability. Personally, I think of it as a bit of an ugly hack.

There are at least three problems with this:

1: As your project evolves, stdafx.h can get quite crowded. Come refactoring-time, you want to remove #includes which are no longer in use. While this can be challenging enough in a large cpp-file, imagine having one file holding includes from tens of others. Now you don’t only have to look through the current file to see if a certain #include can be removed, you have to look through you entire project.

2: stdafx.h now has to be the first header you include, which violates best pratice for include order.

3: Precompiled headers make .cpp-files quicker to compile. But what about when a header itself depends on other expensive headers? Normally, these need to be included in the header itself, leading to long compilation time for everyone using this header. A popular technique with precompiled headers is to not include these in the header. Since all .cpp-files will have included stdafx.h before including this header, everything will be ok. In addition to the problems laid out in 2, this poses a bigger problem when this header-file is included in .cpp-files in other projects, or even other solutions. If someone now wants to include my my_header.h, it is suddenly their responsibility to include headers that my_header.h uses internally.

That about sums it up. Consider precompiled headers added to my list of “necessary evils in C++”.

If you enjoyed this post, you can subscribe to my blog, or follow me on Twitter.

On the Importance of Fitting in


Programming in a object oriented language can be seen as an exercise in extending the type system. And if all your code is wrapped nicely in classes and functions, what’s left is just combining those using the language. Simple, right?

Seen from this viewpoint, the importance of designing your types correctly become very important. And the best way to design them correctly, is to have them behave as much as possible as the built-in types and library types. (On a side note, this is one reason I dislike Java’s lack of operator overloading.)

As an example, say I am designing an embedded system for a car stereo. Every radio-station is stored in a RadioStation class. There is also a RadioStationContainer class that manages the radiostations. Now we need a function to add RadioStations to the container. What do we name it? What name will make a good interface for the user of this library? addRadioStation()?

I would say a much better name is push_back(). Even though you might think addRadioStation() sounds like a more intuitive name, if you are making a container, I’d argue having it behave like all other containers is more intuitive.

How about allowing people to iterate over radio stations? The iterator type will depend on the type of container RadioStationContainer is using internally. One method I’ve seen is people use something like this (oustide the RadioStationContainer class): typedef std::list<RadioStation> RSCit. This gives people a short an easy name for the iterator, right? Again I would argue you should instead make a normal typedef inside the class, so people can use the normal RadioStationContainer::iterator. If they need a shorthand, they can make their own typedef.

Here is an example of a RadioStationContainer that can be used as a normal container:

class RadioStationContainer {
public:
    //Define the normal iterator types the user will expect
    typedef list<RadioStation>::iterator iterator;
    typedef list<RadioStation>::const_iterator const_iterator;

    //Default constructor and copy constructor
    RadioStationContainer() {}
    RadioStationContainer(const RadioStationContainer& rc) {
        copy(rc.begin(), rc.end(), back_inserter(stations));
    }

    //push_back() defined with the normal container interface
    void push_back(const RadioStation& s) { stations.push_back(s); }

    //iterators for working with both const and non const RadioStationContainers
    iterator begin() { return stations.begin(); }
    iterator end() { return stations.end(); }
    const_iterator begin() const { return stations.begin(); }
    const_iterator end() const { return stations.end(); }

private:
    list<RadioStation> stations;

};

This will fit nicely with how a user of the library expects a container to behave. But there is more! This will also fit very nicely with how the Standard Template Library expects a container to behave! You have already seen an example, using copy and back_inserter in the copy constructor. But now the user is also free to use transform, for_each etc:

void doStuffWithStation(RadioStation& s);

void f(RadioStationContainer& rc) {
    for_each(rc.begin(), rc.end(), doStuffWithStation);
}

So when in doubt, always try to fit in.

Minimize the Scope of Each Variable


Whenever you declare a variable, please make sure to make its scope as small as possible. Yesterday I was refactoring a pretty large function, with lots of loops in it. It looked something like this:

void f() {
  int index;
  double delta;
  //(...)tens of more variable definitions
  for (index = 0; index < _v1.size(); ++index) {
    delta = getDelta();
    //Computations, using delta and other variables
  }
  for (index = 0; index < _v2.size(); ++ index) {
    delta = getDelta();
    //Computations, using delta and other variables
  }
  //(...) hundreds of similar lines
}

This is of course a simplified example, the real function was four hundred lines long, and full of array-indexing and computations. (It had been ported more or less verbatim from Fortran, which explains the clustering of declarations at the top.)

The problem when making a change to such a function, is that the scope you need to understand is unnecessarily large. You never know if the value of index or delta is used in some clever way further down in the function, so if are changing it in one place, you basically have to grok the entire function.

On a side note, this is also a bad idea:

void f() {
  double delta;
  for (...) {
    delta = getVal();
    //Computations, using delta and other variables
  }

Even though it looks like you have cleverly optimized the definition of delta outside of the for loop, the only thing you have done is increase its scope and decrease readability. The compiler is perfectly capable of doing such optimizations for you.

Posted in C++. Tags: . 2 Comments »

An Interface is More than Names and Arguments


If you claim to conform to an interface, it is not enough to follow the syntax, you must also be careful about the semantics.

I saw an example of this the other day, when I came across a custom vector class (different from std::vector) used for numerics. It provided much of the same interface provided by std::vector, among which, resize().

It looked something like this:

template <class T>
class Vec {
  (...)
  //! STL vector interface
  void resize(int size, T value);
}

That was however all the documentation that was available. Since it claimed to confirm to the stl::vector interface, I naturally assumed it meant the same thing.

Here is the documentation for std::vector::resize(size_type sz, T c):

If sz is smaller than the current vector size, the content is reduced to its first sz elements, the rest being dropped.

If sz is greater than the current vector size, the content is expanded by inserting at the end as many copies of c as needed to reach a size of sz elements.

Just to be sure, I had a look at the implementation of Vec::resize(int size, T value), and what do you know:

template <class T>
  //What was going on, conceptually:
  void Vec::resize(int size, T value) {
    _v.resize(0);
    _v.assign(size, value);
  }

See the difference? Vec::resize() drops and initializes all elements, whereas vector::resize() only initializes any extra elements. This might be dangerous if you for instance port from Vec to vector and rely on all elements to be reinitialized by resize(), or if you port from vector to Vec and rely on resize() to keep your old values.

So when claiming to confirm to an interface, make sure to not only adhere to the signature, but also to the semantics.

Allow the Compiler To Do Copies for You


Whenever possible, allow the compiler to do your copies for you. After all, it may know something you don’t:

bool search_for_token(std::string& str); //Tokens must be uppercase

bool check(const std::string& str) {
  string upper(str);
  makeUppercaseInplace(upper);
  return search_for_token(upper);
}

Even though the general best practice for passing non-changing objects is by reference to const, if you need to make a copy anyway, let the compiler do the job:

bool check(std::string str) {
  makeUppercaseInplace(str);
  return do_check(str);
}

What is the difference? It looks like when check() is called, a copy is made no matter if you or the compiler does it. The difference is, when you allow the compiler to do the job, you also alow it to skip the copy if it is not strictly needed. If the compiler detects that the string you are calling check with can never be used again, it can skip the copy and give that string directly to makeUpperCaseInplace(). Example:

std::string generateString();
check(generateString());

In this situation, the compiler knows that the return value from generateString() can never be used by anyone else[1], and can reuse that object when calling check().