On the Importance of Fitting in


Programming in a object oriented language can be seen as an exercise in extending the type system. And if all your code is wrapped nicely in classes and functions, what’s left is just combining those using the language. Simple, right?

Seen from this viewpoint, the importance of designing your types correctly become very important. And the best way to design them correctly, is to have them behave as much as possible as the built-in types and library types. (On a side note, this is one reason I dislike Java’s lack of operator overloading.)

As an example, say I am designing an embedded system for a car stereo. Every radio-station is stored in a RadioStation class. There is also a RadioStationContainer class that manages the radiostations. Now we need a function to add RadioStations to the container. What do we name it? What name will make a good interface for the user of this library? addRadioStation()?

I would say a much better name is push_back(). Even though you might think addRadioStation() sounds like a more intuitive name, if you are making a container, I’d argue having it behave like all other containers is more intuitive.

How about allowing people to iterate over radio stations? The iterator type will depend on the type of container RadioStationContainer is using internally. One method I’ve seen is people use something like this (oustide the RadioStationContainer class): typedef std::list<RadioStation> RSCit. This gives people a short an easy name for the iterator, right? Again I would argue you should instead make a normal typedef inside the class, so people can use the normal RadioStationContainer::iterator. If they need a shorthand, they can make their own typedef.

Here is an example of a RadioStationContainer that can be used as a normal container:

class RadioStationContainer {
public:
    //Define the normal iterator types the user will expect
    typedef list<RadioStation>::iterator iterator;
    typedef list<RadioStation>::const_iterator const_iterator;

    //Default constructor and copy constructor
    RadioStationContainer() {}
    RadioStationContainer(const RadioStationContainer& rc) {
        copy(rc.begin(), rc.end(), back_inserter(stations));
    }

    //push_back() defined with the normal container interface
    void push_back(const RadioStation& s) { stations.push_back(s); }

    //iterators for working with both const and non const RadioStationContainers
    iterator begin() { return stations.begin(); }
    iterator end() { return stations.end(); }
    const_iterator begin() const { return stations.begin(); }
    const_iterator end() const { return stations.end(); }

private:
    list<RadioStation> stations;

};

This will fit nicely with how a user of the library expects a container to behave. But there is more! This will also fit very nicely with how the Standard Template Library expects a container to behave! You have already seen an example, using copy and back_inserter in the copy constructor. But now the user is also free to use transform, for_each etc:

void doStuffWithStation(RadioStation& s);

void f(RadioStationContainer& rc) {
    for_each(rc.begin(), rc.end(), doStuffWithStation);
}

So when in doubt, always try to fit in.

An Interface is More than Names and Arguments


If you claim to conform to an interface, it is not enough to follow the syntax, you must also be careful about the semantics.

I saw an example of this the other day, when I came across a custom vector class (different from std::vector) used for numerics. It provided much of the same interface provided by std::vector, among which, resize().

It looked something like this:

template <class T>
class Vec {
  (...)
  //! STL vector interface
  void resize(int size, T value);
}

That was however all the documentation that was available. Since it claimed to confirm to the stl::vector interface, I naturally assumed it meant the same thing.

Here is the documentation for std::vector::resize(size_type sz, T c):

If sz is smaller than the current vector size, the content is reduced to its first sz elements, the rest being dropped.

If sz is greater than the current vector size, the content is expanded by inserting at the end as many copies of c as needed to reach a size of sz elements.

Just to be sure, I had a look at the implementation of Vec::resize(int size, T value), and what do you know:

template <class T>
  //What was going on, conceptually:
  void Vec::resize(int size, T value) {
    _v.resize(0);
    _v.assign(size, value);
  }

See the difference? Vec::resize() drops and initializes all elements, whereas vector::resize() only initializes any extra elements. This might be dangerous if you for instance port from Vec to vector and rely on all elements to be reinitialized by resize(), or if you port from vector to Vec and rely on resize() to keep your old values.

So when claiming to confirm to an interface, make sure to not only adhere to the signature, but also to the semantics.

Show Me Your Signature, and I’ll Tell You Who You Are


When you write your function signatures, you have a choice between passing values, pointers or references. You might be able to make any of them work for the compiler, but what do they tell the user?

Note that even though pointers and reference are somewhat related, and mostly communicate the same thing, they have different suggestions about ownership.

Parameters

1: Pass by value

void foo(Bar b); I need to copy your object, because I need to modify it, and you don’t want to see the change. (Except for built-in types, which are usually passed by value even though they are not modified by the function.)

2: Pass by reference/pointer

void foo(Bar& b); void foo(Bar* b); I need a reference to your object, in order to modify it, beacuse you need to see the change.

3: Pass by reference to const

void foo(const Bar& b); void foo(const Bar * b); I won’t need to modify your object, and I don’t want to pay the price of a copy.

You could argue that 1 just means I will leave your object alone, and doesn’t say anything about modification. But if you aren’t going to modify it, you should use 3, so I think 1 explicitly states that the object is going to be modified (invisibly to the caller). Unless of course the argument is a built-in type.

Return values

1: Return by value

Bar foo(); Here, take this object and do whatever you want with it, I won’t touch it again. It’s yours.

2: Return a reference

Bar& foo(); There is an object over here that you can use. Someone else might silently change it, though. By the way, I own it, and it is my responsibility to delete it. (If foo() is a non-static member function, you can usually assume that won’t happen until the object on which it is called goes out of scope. If foo() is a static function, you can usually assume this won’t happen until the program exits.)

3: Return a pointer

Bar* foo(); There is an object over here that you can use. Someone else might silently change it, though. By the way, I might delete it at any time. Or maybe that’s your responsibility, and if you don’t, you’ll have a memory leak. But I won’t tell you which one it is! You need more information to be sure, for instance the documentation. Often, you can also deduce ownership from the situation. A singleton retains ownership, a factory does not.

4: Return a reference/pointer to const

const Bar& foo(); const Bar* foo(); There is an object over here that you can use, and I promise it won’t change, even if I still own it. Rules of deletion are as in 2. The reason I don’t list the ownership issues for the pointer in this case, is that I think const is an indication that ownership is retained. If it was not, why use const at all?

Make APIs hard to use incorrectly


So far I have written a couple of posts about bad practices I have found in other peoples code. Todays blunder is however one I was about to commit myself. A library I was extending had this (as usual, simplified) function:

void Library::write(const char* c, int i);

This was however not always a safe thing to do, so I found all the callers had wrapped it with almost identical code to protect it. I decided to add a method to the library, doing exactly the same thing, but with the added safety-wrapping. Fortunately, I didn’t even need to change the signature, so I could easily make a new function

void Library::writeSafely(const char* c, int i) {
    safety();
    write(c,i);
    more_safety();
}

and change all the callers to use this new function. Luckily, we still had control of all the users of this API, so this was no big deal. Can you spot my mistake?

The title is kind of revealing, hinting at the API design best practice “Make Interfaces Easy to Use Correctly and Hard to Use Incorrectly”, which for instance Scott Meyers mentions both in 97 Things Every Programmer Should Know (Chapter 55) and Effective C++ (Item 18).

If you need to provide both a safe and an unsafe version of a function, make the safe version the easiest to use. In this case, instead of just adding the new function, I renamed the old function write_raw(), and made a new, safe function write() that wrapped write_raw().

In a few months, someone unfamiliar with this API will come along looking to write something, and will most certainly use write() after skimming through the API. He will now be defaulting to the safe version, if he wants raw access he will need to dig a bit further.

And that someone might just as well be myself.