Wednesday, November 22, 2006

Function static variables in multi-threaded environments

The instantiation process of function static variables isn't necessarily what you expect it to be. Raymond Chen posted a nice bit on this some time back ( If you haven't read it, go do so now. I won't repeat it here.

At this point, there's a 90% chance that you've actually skipped the recommended read, so I'll go ahead and sum it up real quick. Yeah I know I said I wouldn't. I lied. Get on with your life. Anyhoo, here goes the five second recap ..

Consider the following function:

void foo()
static int x = calcSomething();
It seems simple enough, and it is. The static variable will be initialized once, based on the result of the function calcSomething. With non-volatile constant values, the compiler can optimize the generated code to use the memory address of the value. In this case, where a function is called, a function we know nothing about might I add, it doesn't necessarily have that luxury. Looking at the generated assembly code, we'll see something like this

mov     eax,1 
test byte ptr [$S1],al
jne foo+1Dh
or dword ptr [$S1],eax
call calcSomething
mov dword ptr [x)],eax
Loosely translated to pseudo C++, this will be

void foo()
static bool x_set = false;
static int x;
x_set = true;
x = calcSomething();
As you can see, there's no interlocking code here. This essentially means that function will be anything but thread safe. One thread may reach, but not execute, x_set = true, only to be swapped out in favor of another thread which does the same. The result would be that calcSomething is executed two or more times -- which is likely to be a bad thing.

That's it for the recap. Now, if we'd like to fix this problem, what comes to mind? Interlocking, obviously...

This article can now be found at

Monday, November 20, 2006

Cross thread calls in native C++, #4

If you haven't read the previous entries in this series, head back to before continuing.

There are a few restrictions on the use of a framework such as the one described here. Some are merely points to be wary of, while others are showstoppers.

The parameters passed to a function which will be called from another thread, should not use the TLS (Thread Local Storage) specifier, that goes without saying. A variable declared TLS (__declspec(thread)) will have one copy per thread it's accessed from. In terms of the previous example, the main thread would not necessarily see the same data as testThread, even with the value passed through the synchronized call mechanism to testFunction. In short: there's nothing stopping you from passing TLS, but through doing so you are bound to see some odd behavior. The general guideline is to be thoughtful. Don't pass anything between threads without knowing exactly what the consequences are. Even though the mechanism, or rather principle, of cross thread synchronized calls goes to great lengths to keep the task simple; there are always ways to stumble.

A couple of guidelines and requirements regarding parameter passing and returning, in an example scenario where Thread A does a synchronized call to Function F through Thread B:

  • If Function F has to return pointers or references, make them const so they cannot be touched by Thread A. Even when they are const, Thread B can free them, and thus make reads from Thread A crash. Don't use pointers or references unless you are absolutely sure this won't happen. Returned pointers or references belong to Thread B.
  • If Function F accepts pointers or references as parameters from Thrad A, make sure that they aren't referenced by Thread B, neither read nor written, once F returns. Passed pointers or references belong to Thread A.
  • Class types returned from Function F to Thread A must provide a public copy constructor, either compiler generated or user implemented. 
  • At the time of writing, the framework does not support exception transport between the threads. If Function F happens to throw an exception, this will be captured by the framework. The CallScheduler will then throw a new, unrelated, exception back to Thread A. Since this makes it difficult for Thread A to track which exception really occurred in Function F, which leads me to advise against such throwing in the first place.

I'm currently wrapping up the framework, and I'll post a demo project here within the next few days. There are still some remaining points on my TODO-list, but  I'll disregard most of them for the time being.

Wednesday, November 15, 2006

Cross thread calls in native C++, #3

If you haven't read number one or two in this series, head back to before continuing.

Ok, so we've covered the motivation, as well as some of the requirements. It's time to give off an example of how the mechanism can be used. For the sake of utter simplicity, I will not bring classes and objects into the puzzle just yet. Just imagine the following simple console program

char globalBuffer[20];

// Keep sleeping while the event is unset
while(WaitForSingleObjectEx(hExternalEvent, INFINITE, TRUE) != WAIT_OBJECT_0)

// Alter the global data
for(int i = 0; i < sizeof(globalBuffer) - 1; ++i)
globalBuffer[i] = 'b';
globalBuffer[sizeof(globalBuffer) - 1] = 0; // null terminate

// Return and terminate the thread
return 0;

int main()
DWORD dwThreadId;
CreateThread(NULL, 0, testThread, NULL, 0, &dwThreadId);
There's nothing out of the ordinary so far. We've got the entry point, main, and a function, testThread. When main is executed, it will create and spawn a new thread on testThread. All testThread does in this example, is to wait for an external event to be signaled, and then alter a data structure, globalBuffer. What's important is that the thread is waiting for something to happen, and while it's waiting we can instruct it to do some other stuff. Our objective is therefore to have the thread call another function, testFunction:

string testFunction(char c)
for(int i = 0; i < sizeof(globalBuffer) - 1; ++i)
globalBuffer[i] = c;
globalBuffer[sizeof(globalBuffer) - 1] = 0; // null terminate
return globalBuffer;
testfunction will alter the global buffer, setting all elements except the last to the value of the char parameter c, then null terminate it and finally return a new string with the global buffer's content. What we can tell straight away, is that testFunction and testThread may alter the same buffer. If our main thread executed testFunciton directly, it could get around to alter the first 10 or so elements of the global before being swapped out of the CPU. If the external event in testThread were to be signaled at this point, that thread would also start altering the buffer. The string returned from testFunction would obviously contain anything but what we expect it to.

While this example doesn't make much sense in terms of a real world application as it is, the concept is very much realistic. Imagine, if you wish, that the global buffer represents the text in an edit box within a dialog, and that testThread is supposed to alter this text based on a timer. At certain intervals, external threads may also wish to update the same edit box with additional information, so they call into the GUI's class (which in this simplistic example is represented by testFunction). To avoid crashes, garbled text in the text box, or other freaky results, we want to synchronize the access. We don't want to add a heap of mutexes or ciritcal sections to our code, but rather just have the GUI thread call the function which updates the text. When the GUI thread alone is in charge of updating its resources, we're guaranteed that all operations go about in a tidy order. In other words: there will be no headache-causing crashes and angry customers.

So, instead of adding a whole lot of interlocking code to both testThread and testFunction, which both update the global buffer, we use a cross thread call library to have the thread which owns the shared data do all the work.

int main()
DWORD dwThreadId;
CreateThread(NULL, 0, testThread, NULL, 0, &dwThreadId);

CallScheduler<APCPickupPolicy>* scheduler = CallScheduler<APCPickupPolicy>::getInstance();

string dataString = scheduler->syncCall<string>(dwThreadId, boost::bind(testFunction, 'a'), 500);
cout << "testFunction returned: " << dataString << endl;
cout << "Call timeout" << endl;
cout << "Call scheduling failed" << endl;

return 0;

CallScheduler makes all the difference here. Through instantiating a reference to this singleton class, with the preferred pickup policy (in this case the APCPickupPolicy), we can schedule calls to be made in context of other threads, granted that the are open for whatever mechanism the pickup policy uses. In our current example, we know that the testThread wait is alertable, and that suits the APC policy perfectly. To attempt to execute the call in the other thread, we call the syncCall function, with a few parameters. The template parameter is the return type of the function we wish to execute, in this case a string. The first parameter is the id of the thread in which we wish to perform the operation, the second parameter is a boost functor, and the third is the number of milliseconds we are willing to wait for the call to be initiated. The use of boost functors also allows us to bind the parameters in a timely fashion. As you can see in the above call, testFunction should be called with the char 'a' as its sole parameter.

At this point, we wait. The call will be scheduled, and will hopefully completed. If the pickup policy does it's work, the call will be executed in the other thread, and we are soon to get a the string from testFunction as returned by syncCall. Should the pickup fail or timeout, an exception will be thrown. Consider the example -- it really should make it all pretty clear.

As for limitations, restrictions, guarantees in terms of reliability and how the framework is designed: I'll get back to that in the next update. Once again, stay tuned.

Tuesday, November 14, 2006

Cross thread calls in native C++, #2

If you haven't read number one in this series, see before continuing.

Throughout the last few years, I've had a number of approaches to this field of problems. Usually, I've ended up using a mix of #2 and #3 as listed earlier. While I've made a few abstractions, and integrated this in a threading library, there was nothing major about it. It wasn't till I had a crack at the .NET framework, and more specifically the InvokeRequired / BeginInvoke techniques, that I started pondering doing the same in a native framework. The .NET framework approach really is appealing from a usage point of view, as it introduces a bare minimum of alien code to, say,  the business logic. While many would argue that the ideal approach would be to avoid synchronization altogether, and rely on the operating system to deal with the complexities related to cross thread calls and simultaneous data access; that's not likely be part of any efficiency focused application anytime soon.

I won't go into the details of my first few synchronization frameworks, but rather be focusing on the one I typed up specially for this read. It is, as mentioned, based on the ideas from the .NET framework, but it's not quite the same. Granted the differences between native and managed code, as well as the syntaxical inequalities, the mechanics have to be a little different, and so is the use. The motivation of the framework is obviously to simplify cross thread calls, which may or may not access shared resources. It goes to great lengths to be safe, flexible, and reliable in terms of its promises to the user. The flexibility is achieved through the introduction of templated policies for the notifications made across the threads, as well as functors and parameter bindings from Boost. I'll get back to the reliable part in a jiffy.

The base principle is quite simple. Thread A needs to update or process data logically related to Thread B. To do this, A wants to issue a call in context of B. Thread B is of a nature which allows it to sleep or wait for commands from external sources, so that'll be the window in which A can make it's move. Thread B would ideally be GUI related, a network server / client, an Observer (as in the Observer Pattern) or similar.

What needs to be done is:

  1. Thread A must call a function to scheudule execution in Thread B, with or without parameters.
  2. While the call waits to be executed, Thread A must be suspended. If the call doesn't end within a critical period of time, the control must be given back to Thread A, with a notification that the call failed. If A is notified of a call timeout, the call must be guaranteed not to take place.
  3. Thread B is notified that a call should be executed. We'll call this the PickupPolicy, since B will have to pickup an instruction from A to do some task. This is where the policy comes in.
  4. Thread B will execute the scheduled call, which may or may not return a value, and continue about it's business.
  5. Thread A returns the resulting value, and also picks up where it left off.

The pickup policy, or more specifically the way Thread A delivers the notification to Thread B, can involve a number of different techniques. A couple worth mentioning are UserAPCs (user-mode asynchronous procedure call) and Window Messages. QueueUserAPC allows one to queue a function for calling in context of a different thread, and relies on the other thread to go into alertable wait for the call to be made. Alertable waits have their share of problems, but I'll disregard those for now. In terms of the GUI type thread, window messages are a better alternative. The pickup policies make up a fairly simple part of this play, but they are nevertheless important in terms of flexibility.

Next up, I'll give an example of how the mechanism works, from an end-programmer point of view. Stay tuned.

Monday, November 13, 2006

Cross thread calls in native C++, #1

As mentioned a few days ago, I intend to write down some of my ponderings and works on how to make synchronized calls across threads. The motivation for such mechanisms, is 1. to simplify inter-thread notifications, and 2. avoid cluttering classes and functions with more synchronization code than what's absolutely necessary. I assume that you, the reader, is at least vaguely familiar with threads, and all the pitfalls they introduce when common data is being processed. A classical example is the worker thread which fires off a callback function in a GUI class, to render some updated output. There are a bunch of different approaches, let alone patterns (e.g. Observer), to use in this case. I'll completely disregard the patterns, and focus on the actual data and notification.

Imagine the worker class Worker, and the GUI class SomeWindow. How they are associated makes little or no difference, what's important is that Worker is supposed to call a function, and/or update data in SomeWindow. The application has two threads. One "resides" in Worker, and the other in SomeWindow. Let's say that at a given point in time, the Worker object decides to make a notification to SomeWindow. How can this be done? I can sum up a few of the possible approaches, including major pros/cons.

  1. Worker accesses, and updates, a data member in SomeWindow.
    • Pros: It's quick.
    • Cons: It's dirty. More specifically, it breaks encapsulation. If this operation is done without some kind of interlocking (mutex / criticalsection / semaphore / etc.), the worker and window threads may both try to access the data member at once, and that is most certain to wreak havoc on our application. If we're lucky, it'll just cause an access violation. If SomeWindow exposes an object for interlocking, we break the encapsulation even further, unleashing ghosts such as deadlocks.
  2. Worker calls a function within SomeWindow, which updates a data member for us.
    • Pros: Granted the proper interlocking, it's relatively safe.
    • Cons: SomeWindow will be bloated with code for interlocking, in the worst possible case, one lock object per updatable piece of member data. It also arguably weakens the cohesion, by introduction of those very locks. Dealing with the complexities of threads, interlocking and synchronization in a verbose way is simply not very ideal in a GUI class.
  3. Worker sends a Window Message to SomeWindow, with the update data in a structure. SomeWindow deals with the message and somehow handles the data.
    • Pros: Relatively safe, if SendMessage is used.
    • Cons: Cohesion slightly weakened. Parameter translation and transport can become tiresome, as custom or generic structures are needed for each unique value lineup. The most prominent drawback of this approach is the link to window messages; it's not really practical for non-GUI scenarios.
  4. Worker calls a function within SomeWindow, which updates a data member for us, by use of a synchronized re-call.
    • Pros: Safe. Relatively effective. No bloat worth mentioning.
    • Cons: Cohesion slightly weakened. The code fundament is a wee bit more complex than it would be without the threads, but it's by no means incomprehensible, and the end-of-the-line code will be quite pleasant.

What this and future posts will dig into is option number four. I'll also present a templated framework I've been constructing, which allows us to create this kind of functionality with as miniscule a headache as possible. I'll leave it hanging here for now, though. Look back for episode two shortly.

Thursday, November 9, 2006

Put a hex on that dump

This is actually a snippet I posted to my forum last year, but seeing as my forum is a horrible place to read, I'll re-post it here. Why my forum is horrible? Well it's spam infested, for one. And I'm currently all too busy to move my site to the new colo, which has a brand new ASP.NET site + community server waiting. Ok, that wasn't entirely true -- the site isn't completed yet (... busy and all that). Anyhoo, here's the real content:

std::string hexdump(void* x, unsigned long len, unsigned int w)
std::ostringstream osDump;
std::ostringstream osNums;
std::ostringstream osChars;
std::string szPrevNums;
bool bRepeated = false;
unsigned long i;

for(i = 0; i <= len; i++)
if(i < len)
char c = (char)*((char*)x + i);
unsigned int n = (unsigned int)*((unsigned char*)x + i);
osNums << std::setbase(16) << std::setw(2) << std::setfill('0') << n << " ";
if(((i % w) != w - 1) && ((i % w) % 8 == 7))
osNums << "- ";
osChars << (iscntrl(c) ? '.' : c);

if(osNums.str().compare(szPrevNums) == 0)
bRepeated = true;
if(i == len - 1)
osDump << "*" << std::endl;

if(((i % w) == w - 1) || ((i == len) && (osNums.str().size() > 0)))
osDump << "*" << std::endl;
bRepeated = false;
osDump << std::setbase(16) << std::setw(8) << std::setfill('0') << (i - (i % w)) << " "
<< std::setfill(' ') << std::setiosflags(std::ios_base::left)
<< std::setw(3 * w + ((w / 8) - 1) * 2) << osNums.str()
<< " |" << osChars.str() << std::resetiosflags(std::ios_base::left) << "|" << std::endl;
szPrevNums = osNums.str();

osDump << std::setbase(16) << std::setw(8) << std::setfill('0') << (i-1) << std::endl;

return osDump.str();

x is the base memory location for the dump.
len is the number of bytes to dump.
w is the number of bytes to display per line.

Before you go ahead and cry "code smell", let me pull out the example output:

00000000  00 01 02 03 04 05 06 07 - 08 09 0a 0b 0c 0d 0e 0f  |................|
00000010 10 11 12 13 14 15 16 17 - 18 19 1a 1b 1c 1d 1e 1f |................|
00000020 20 21 22 23 24 25 26 27 - 28 29 2a 2b 2c 2d 2e 2f | !"#$%&'()*+,-./|
00000030 30 31 32 33 34 35 36 37 - 38 39 3a 3b 3c 3d 3e 3f |0123456789:;<=>?|
00000040 40 41 42 43 44 45 46 47 - 48 49 4a 4b 4c 4d 4e 4f |@ABCDEFGHIJKLMNO|
00000050 50 51 52 53 54 55 56 57 - 58 59 5a 5b 5c 5d 5e 5f |PQRSTUVWXYZ[\]^_|
00000060 60 61 62 63 64 65 66 67 - 68 69 6a 6b 6c 6d 6e 6f |`abcdefghijklmno|
00000070 70 71 72 73 74 75 76 77 - 78 79 7a 7b 7c 7d 7e 7f |pqrstuvwxyz{|}~.|
00000080 80 81 82 83 84 85 86 87 - 88 89 8a 8b 8c 8d 8e 8f |ÇüéâäàåçêëèïîìÄÅ|
00000090 90 91 92 93 94 95 96 97 - 98 99 9a 9b 9c 9d 9e 9f |ÉæÆôöòûùÿÖÜø£Ø׃|
000000a0 a0 a1 a2 a3 a4 a5 a6 a7 - a8 a9 aa ab ac ad ae af |áíóúñѪº¿®¬½¼¡«»|
000000b0 b0 b1 b2 b3 b4 b5 b6 b7 - b8 b9 ba bb bc bd be bf |¦¦¦¦¦ÁÂÀ©¦¦++¢¥+|
000000c0 c0 c1 c2 c3 c4 c5 c6 c7 - c8 c9 ca cb cc cd ce cf |+--+-+ãÃ++--¦-+¤|
000000d0 d0 d1 d2 d3 d4 d5 d6 d7 - d8 d9 da db dc dd de df |ðÐÊËÈiÍÎÏ++¦_¦Ì¯|
000000e0 e0 e1 e2 e3 e4 e5 e6 e7 - e8 e9 ea eb ec ed ee ef |ÓßÔÒõÕµþÞÚÛÙýݯ´|
000000f0 f0 f1 f2 f3 f4 f5 f6 f7 - f8 f9 fa fb fc fd fe ff |­±=¾¶§÷¸°¨·¹³²¦ |
00000100 61 61 61 61 61 61 61 61 - 61 61 61 61 61 61 61 61 |aaaaaaaaaaaaaaaa|
00000120 61 61 61 61 61 61 62 62 - 62 62 62 62 62 62 62 62 |aaaaaabbbbbbbbbb|

The output looks a tad borked here, but it really does output quite nice to the console. Either way, there it is. I might tidy the whole thing up at some point, but up until now it's made up a not-very-critical piece of my codebase, and is as such not eligible for the same prudent treatment as the rest of it. Feel free to give me a heads up if you happen to spot anything out of the ordinary, or have any good additions in mind. Remember, though: never write code this factored, magic number infested and generally messy. It'll eat your brain. Read my "Message Only Window" article at CodeGuru for a bunch of pointers on good practices for writing code.

Btw, see for real-world use of the function.

Sunday, November 5, 2006

"Where did the function call originate from?" version 2.0

In one of the earlier posts here I showed how the _ReturnAddress intrinsic could be used to see which module a call originated from. In this minimalistic episode, I'll show how to resolve the return address to a symbol (granted the symbol information, of course).

This following function will attempt to resolve the name of symbol at the referenced address. It can easily be merged with the previsouly mentioned _ReturnAddress function, in a way such as getSymbolName(GetCurrentProcess(), _ReturnAddress());

string getSymbolName(HANDLE hProcess, DWORD64 dwAddress)
static BOOL bSymbolsLoaded = FALSE;
// Replace the second parameter of the following call with the path of the
// folder in which symbols for the process can be found. NULL will
// cause the current working directory to be searched.
if(!SymInitialize(GetCurrentProcess(), NULL, TRUE))
throw exception("Symbols could not be loaded");
bSymbolsLoaded = TRUE;

ULONG64 buf[(sizeof(SYMBOL_INFO) + MAX_SYM_NAME * sizeof(TCHAR) +
sizeof(ULONG64) - 1) / sizeof(ULONG64)];

SYMBOL_INFO* pSI = reinterpret_cast<SYMBOL_INFO*>(buf);

pSI->SizeOfStruct = sizeof(SYMBOL_INFO);
pSI->MaxNameLen = MAX_SYM_NAME;

DWORD64 dwDisplacement;
if(!SymFromAddr(hProcess, dwAddress, &dwDisplacement, pSI))
throw exception("Failed to retrieve the symbol information");

return static_cast<char*>(pSI->Name);

In case of the previous example, the first "Where did the function call originate from?" post, the expected output would include module  name only. The preceeding snippet will allow you to show the symbol name aswell, that is the name of the calling function (which would be main in case of the expected output of the previous post). 

A keen observer browsing throug the DbgHelp API will also notice that the need to iterate loaded modules is no longer needed in that old example. While this is true if you've got the symbols, and actually use the DbgHelp API, my use of the approach has had the luxury of none of the two. I might elaborate on that alongside a description of my API Hooking library :)

Updates enroute

I'm working on a post about various ways to do synchronized cross thread function calls, using scheduler-like mechanisms. I've got a tendency to write a whole lot more than I should, so even when I try to keep this short, it's getting bloated. This post serves as both a warning and a reminder. A reminder that I'm still alive, and a warning that content is coming (... or vice versa.)

In other news, I received a nomination for the MVP award from Microsoft yesterday. It caught me somewhat off guard, as I hadn't considered myself doing anything out of the ordinary. Regardless of whether or not I actually get the award, I hope that my writings (be it here, on Codeguru or the MSDN forums) are actually helping (or amusing) someone, and that alone is reason enough for me to continue. For more information on what the MVP award is, see