Thunking in x64.. Oh woe is me.
When the Thunk32 library was released on CodeProject back in December, Todd Smith promptly asked for x64 support. While it didn't initially strike me as being too much of a struggle, it occurred to me some time later that the x64 fastcalls would make a mess of things.
Just to bring you up to speed, here's a x64 crash course:
- The new quadword registers have got a leading "R", such as RCX, RAX and so forth.
- In addition to new flavors of old registers, there are eight new general purpose registers: R8->R15. These can be accessed as 32/16/8 bit with a D/W/B suffix, respectively.
- The volatile registers are RAX, RCX, RDX, R8, R9, R10, R11, XMM0L, XMM1L, XMM2L, XMM3L, XMM4 and XMM5. All others are non-volatile, and must *not* change across calls.
- Fastcall is the one and only calling convention, through which the first four integer arguments are passed in RCX, RDX, R8 and R9 (with additional spill space reserved on the stack), and all following params on the stack. The first four floating point parameters are passed in XMM0 through XMM3 (128 bit SSE2 registers). The caller always cleans the stack.
[RETURN ADDRESS]
[SPILL FOR PARAM1]
[SPILL FOR PARAM2]
[SPILL FOR PARAM3]
[SPILL FOR PARAM4]
[PARAM5]
[PARAM6]
That's it for the recap, and here are the woes: While x64 itself is well and dandy, the calling convention brings trouble. For member function calls, RCX contains the this pointer, and that means that for a non-member call to go member; RCX must be pushed to RDX; RDX to R8; R8 to R9; and R9 onto the stack. This essentially means that the stack must be modified prior to the call being made, and restored before returning to the original caller. Consequently the original return address must be stored, and all volatile registers taken care of appropriately. In addition to this, there's the issue of floating point and integer parameters being passed in different registers. All added up: it's relatively safe to say that the same thunk cannot be used regardless of parameter count and data type, and that the absolute worst case will be a fairly massive piece of code.
As for the Thunk64 library; it's not completed yet. While I've implemented working 64 bit thunks, the size and look of it feels too much to bear at the moment. That being said, I'll disclose some of my thoughts here as I move along, and hopefully the final result will turn out ok. Let this post serve as a warning of what is to come.
(I suggest you bring a shovel next time.)

