On x64, functionn call stack alignment is 16 bytes.
On ARM, 8 bytes.
On x86, 4 bytes.
Contigious double-word CAS on x86 requires 8 byte alignment.
It appears that stack alignment cannot be controlled through the MSVC compiler (recent versions of GCC support this, though).
Accordingly, on x86, either I pass in the address of an already aligned variable and deref, or I pass by value and copy to a local variable which is correctly aligned.