Hendler, Shavit, Yerushalmi - A Scalable Lock-Free Stack Algorithm
I've not read the paper beyond the introduction, but the underlying concept is obvious (once someone has told you). The performance problem with a stack is the contention on the stack head pointer. If we imagine a thread performing a push, and a thread performing a pop, and somehow those two threads could be aware of each other, then those two threads could cancel each other out, each servicing the other, without having to touch the stack itself, thus removing contention from the stack head pointer.
I'm a bit puzzled by this, because it seems to me useful only for freelists (where elements are wholly interchangable) rather than stacks (where we may care about the order in which elements enter/emerge from the stack, i.e. we are relying on its FILO property).