Friday, April 6, 2007

Thunking in x64.. Oh woe is me.

When the Thunk32 library was released on CodeProject back in December, Todd Smith promptly asked for x64 support. While it didn't initially strike me as being too much of a struggle, it occurred to me some time later that the x64 fastcalls would make a mess of things.

Just to bring you up to speed, here's a x64 crash course:
  • The new quadword registers have got a leading "R", such as RCX, RAX and so forth.
  • In addition to new flavors of old registers, there are eight new general purpose registers: R8->R15. These can be accessed as 32/16/8 bit with a D/W/B suffix, respectively.
  • The volatile registers are RAX, RCX, RDX, R8, R9, R10, R11, XMM0L, XMM1L, XMM2L, XMM3L, XMM4 and XMM5. All others are non-volatile, and must *not* change across calls.
  • Fastcall is the one and only calling convention, through which the first four integer arguments are passed in RCX, RDX, R8 and R9 (with additional spill space reserved on the stack), and all following params on the stack. The first four floating point parameters are passed in XMM0 through XMM3 (128 bit SSE2 registers). The caller always cleans the stack.
A typical stack layout, given a 6 param x64 call, could be:


That's it for the recap, and here are the woes: While x64 itself is well and dandy, the calling convention brings trouble. For member function calls, RCX contains the this pointer, and that means that for a non-member call to go member; RCX must be pushed to RDX; RDX to R8; R8 to R9; and R9 onto the stack. This essentially means that the stack must be modified prior to the call being made, and restored before returning to the original caller. Consequently the original return address must be stored, and all volatile registers taken care of appropriately. In addition to this, there's the issue of floating point and integer parameters being passed in different registers. All added up: it's relatively safe to say that the same thunk cannot be used regardless of parameter count and data type, and that the absolute worst case will be a fairly massive piece of code.

As for the Thunk64 library; it's not completed yet. While I've implemented working 64 bit thunks, the size and look of it feels too much to bear at the moment. That being said, I'll disclose some of my thoughts here as I move along, and hopefully the final result will turn out ok. Let this post serve as a warning of what is to come.

(I suggest you bring a shovel next time.)

ThreadSynch on

I've got some changes in the pipeline for the thread synchronization library. The time frame for the changes hasn't been decided on yet, but I did see it fit to move the project over to Google's project hosting. I considered using Microsoft's CodePlex, but seeing as Google has a Subversion repository ready to roll, I turned to Googlecode. If I spot anything I dislike, I'll be sure to mention that here.

The library is still in a farily early (though stable) state, but I expect that to change over the next few weeks and months. If you feel like contributing, or influence the changes, head over to the project page and sign up.

The first planned addition is late completion of tasks. The details are still unclear, but I've got the requirements worked out. More on that later.

CodeProject article about ThreadSynch:
Googlecode project page: