7down votefavorite
4
It is known that on x86 for the operations load()
and store()
memory barriers memory_order_consume, memory_order_acquire, memory_order_release, memory_order_acq_rel
does not require a processor instructions for the cache and pipeline, and assembler's code always corresponds to std::memory_order_relaxed
, and these restrictions are necessary only for the optimization of the compiler: http://www.stdthread.co.uk/forum/index.php?topic=72.0
And this code Disassembly code confirms this for store()
(MSVS2012 x86_64):
std::atomic<int> a;
a.store(0, std::memory_order_relaxed);
000000013F931A0D mov dword ptr [a],0
a.store(1, std::memory_order_release);
000000013F931A15 mov dword ptr [a],1
But this code doesn't comfirm this for load()
(MSVS2012 x86_64), using lock cmpxchg
:
int val = a.load(std::memory_order_acquire);
000000013F931A1D prefetchw [a]
000000013F931A22 mov eax,dword ptr [a]
000000013F931A26 mov edx,eax
000000013F931A28 lock cmpxchg dword ptr [a],edx
000000013F931A2E jne main+36h (013F931A26h)
std::cout << val << "\n";
some_atomic.load(std::memory_order_acquire) does just drop through to a simple load instruction, and some_atomic.store(std::memory_order_release) drops through to a simple store instruction.
Where am I wrong, and does the semantics of std::memory_order_acquire
requires processor instructions on x86/x86_64 lock cmpxchg
or only a simple load instruction mov
as said Anthony Williams?
ANSWER: It is the same as this bug report: http://connect.microsoft.com/VisualStudio/feedback/details/770885
c++ c++11 concurrency x86 memory-barriers
asked Sep 2 '13 at 15:55
4,431650118
-
1
I'm not sure looking at what the compiler generates is necessarily a good way to determine the requirements of a particular functionality - it's not unheard of that compilers does "more than they need to". – Mats Petersson Sep 2 '13 at 16:04
-
@Mats Petersson Yes, but there is nothing easier than to do nothing. And that was required from the compiler, nothing but
mov
. Really the developers of Microsoft have failed with this the simplest task: "do nothing"? :) – Alex Sep 2 '13 at 16:14 -
I know MS VC (at least SOME versions) will generate extra "locking" on variables declared as
volatile
- not because the C++ standard requires it, but because some bits of code that USED to work on single core processors suddenly work poorly if you use SMP systems. This looks similar to one of those situations. – Mats Petersson Sep 2 '13 at 16:16 -
@Mats Petersson All right. But the volatile appeared a long time ago, when there was nothing known about the
std::memory_order
. And to avoid unnecessary calls to the WinAPI or assembler code, they decided to use the barriers(lock
) for volatile - these three solutions are equally not beautiful. But now with the new standard C++11 all are clearly defined and there is one elegant solution -mov
. Maybe for older x86 processors require to lock forload()
? – Alex Sep 2 '13 at 16:26 -
1
Is it the same as this bug report? connect.microsoft.com/VisualStudio/feedback/details/770885 – jcoder Sep 2 '13 at 16:42
1 Answer
up vote6down voteaccepted
No. The semantics of std::memory_order_acquire
doesn't requires processor instructions on x86/x86_64.
Any load()/store() operations on x86_64 doesn't require processor instructions (lock/fence) exact atomic.store(val, std::memory_order_seq_cst);
which requires (LOCK) XCHG
or alternative: MOV (into memory),MFENCE
.
Processor memory-barriers-instructions for x86(except CAS), and also ARM and PowerPC: http://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html
Disassembler GCC 4.8.1 x86_64 - GDB - load():
20 temp = a.load(std::memory_order_relaxed);
21 temp = a.load(std::memory_order_acquire);
22 temp = a.load(std::memory_order_seq_cst);
0x46140b <+0x007b> mov 0x38(%rsp),%ebx
0x46140f <+0x007f> mov 0x34(%rsp),%esi
0x461413 <+0x0083> mov 0x30(%rsp),%edx
Disassembler GCC 4.8.1 x86_64 - GDB - store():
a.store(temp, std::memory_order_relaxed);
a.store(temp, std::memory_order_release);
a.store(temp, std::memory_order_seq_cst);
0x4613dc <+0x004c> mov %eax,0x20(%rsp)
0x4613e0 <+0x0050> mov 0x38(%rsp),%eax
0x4613e4 <+0x0054> mov %eax,0x20(%rsp)
0x4613e8 <+0x0058> mov 0x38(%rsp),%eax
0x4613ec <+0x005c> mov %eax,0x20(%rsp)
0x4613f0 <+0x0060> mfence
0x4613f3 <+0x0063> mov %ebx,0x20(%rsp)
Disassembler MSVS 2012 x86_64 - load() - it is the same as this bug report: http://connect.microsoft.com/VisualStudio/feedback/details/770885:
temp = a.load(std::memory_order_relaxed);
000000013FE51A1F prefetchw [a]
000000013FE51A24 mov eax,dword ptr [a]
000000013FE51A28 nop dword ptr [rax+rax]
000000013FE51A30 mov ecx,eax
000000013FE51A32 lock cmpxchg dword ptr [a],ecx
000000013FE51A38 jne main+40h (013FE51A30h)
000000013FE51A3A mov dword ptr [temp],eax
temp = a.load(std::memory_order_acquire);
000000013FE51A3E prefetchw [a]
000000013FE51A43 mov eax,dword ptr [a]
000000013FE51A47 nop word ptr [rax+rax]
000000013FE51A50 mov ecx,eax
000000013FE51A52 lock cmpxchg dword ptr [a],ecx
000000013FE51A58 jne main+60h (013FE51A50h)
000000013FE51A5A mov dword ptr [temp],eax
temp = a.load(std::memory_order_seq_cst);
000000013FE51A5E prefetchw [a]
temp = a.load(std::memory_order_seq_cst);
000000013FE51A63 mov eax,dword ptr [a]
000000013FE51A67 nop word ptr [rax+rax]
000000013FE51A70 mov ecx,eax
000000013FE51A72 lock cmpxchg dword ptr [a],ecx
000000013FE51A78 jne main+80h (013FE51A70h)
000000013FE51A7A mov dword ptr [temp],eax
Disassembler MSVS 2012 x86_64 - store():
a.store(temp, std::memory_order_relaxed);
000000013F8C1A58 mov eax,dword ptr [temp]
000000013F8C1A5C mov dword ptr [a],eax
a.store(temp, std::memory_order_release);
000000013F8C1A60 mov eax,dword ptr [temp]
000000013F8C1A64 mov dword ptr [a],eax
a.store(temp, std::memory_order_seq_cst);
000000013F8C1A68 mov eax,dword ptr [temp]
000000013F8C1A6C xchg eax,dword ptr [a]