Tuesday, September 25, 2007

Threading in C++0x: A retrospect

A misleading title, perhaps. Nevertheless, this *is* a retrospect, since I completed my talk on the thread-related additions in C++0x yesterday. All in all I think it turned out ok, although I should have kept it shorter.

Since many of the topics surrounding the threading libraries and language extensions are still under heavy discussion, and the papers change on a near day-to-day basis, I didn't focus too much on the currently proposed features. My approach was to build an understanding of *why* the C++ standard should know what a thread is. For the sake of anyone who might care, this blog entry is a sum-up of the "why?!"-part.

As I see it, there are three major reasons for introducing threading into the language and standard libraries.
  1. Portability
  2. Efficiency
  3. Correctness
The portability factor is pretty obvious. Since a basic thread interface isn't specified in the current standard, there are plenty variations between OS specific APIs. There are a few fair abstractions which wrap these, such as Boost.Thread, but depending on external libraries isn't always an option. Since the standard's sections on application correctness and execution flow also leaves too much dangling, true platform independency is terribly hard to reach.

Efficiency is slightly more important than portability, in the eyes of many. CPUs are no longer seeing the explosive frequency boosts we've witnessed the last decade. Instead, we're now being flooded with cores and additional CPUs. To be able to build ever more efficient applications, with growing complexity; turning to parallel computing is absolutely necessary. For that to happen, C++ developers sorely needs a solid, thread prepared, language to work with.

Correctness is the trio's honcho, and should on its own be more than good enough a reason to welcome the new standard with glee. The sad fact is that multi-threading currently equals undefined behavior. Many may argue that it's perfectly feasible to write good multi-threaded applications with the "current" C++, and they'd of course be kinda right. In the eyes of the standard, however, there's really no telling what's going to happen when you mix multiple threads into a solution.

The C++ standard defines acceptable program flow in section 1.9, "Program Execution". The basis for this chapter is what's called an abstract machine, which specifies a set of rules of combat. One particular concept that people tend to bring up, is the sequence point. Sequence points, simply put, define the order (or lack of such) in which operations take place. When a sequence point is hit, all side effects of prior execution must be completed.

An example: there's a sequence point at the end of a full expression (';'), but there are no sequence points between the arguments to a function call. Therefore, you can never rely on arguments to be evaluated in any particular order.

We haven't gotten to the interesting part yet, though. Sequence points are all well and good, but the problems start popping up when you consider what the standard demands of a conforming implementation. It is only required to emulate the observable behavior of the abstract machine. The observable behavior being reads and writes to volatile data, and calls to library IO functions. Seen from the point of view of a novice programmer, this would seem like utter mayhem. Unless *everything* is volatile, there's (nearly) no telling in what order a set of code lines will be executed in. It's of course a quite necessary evil, as the compiler would be unable to optimize anything if it wasn't allowed to do this. For multi-threaded programming, on the other hand, it can easily be the beginning of a slow and horrible death.

At this point in my talk, I brought up an example from an article written by Scott Meyers and Andrei Alexandrescu in 2004: C++ and the Perils of Double-Checked Locking. I won't pull that back out here now, as you should rather read it at the source. The simple conclusion: You can spend your remaining days trying to trick the compiler into not optimizing your code, with volatiles, forklifts and C&D's, but it won't make a spliff of difference. If you succeed in "beating" the compiler, you're likely to smack head on into a CPU which is allowed to re-order memory operations just about as it sees fit. And if that doesn't end you: the fact that your threads might not even see the same content, when dealing with the same variable; surely will. At that point you can pull a hefty array of memory barriers out of your pockets to solve the correctness problems (at least for a while), but doing so would most certainly have had its toll on both your portability and mental health.

The thread extensions planned for C++0x aren't meant to save everyone, in any possible situation, but it is an attempt to make sense of what's going on in multi-threaded code. Not only will you have a portable way of creating threads, and a sound amount of explanation on what to expect and not expect from a threaded execution. You'll also have atomic types to work with, which are quite effectively capable of preventing destructive optimization and memory reorderings in the CPU, as well as deal with cache coherency. Writing a portable, safe, and not even half ugly looking version of the Singleton from the mentioned DCLP-article, would *nearly* be as simple as wrapping a pointer with an atomic template. I'll leave that for a later post, though.