Does the semantics of `std::memory_order_acquire` requires processor instructions on x86/x86_64?

最新推荐文章于 2025-01-03 20:31:35 发布

linuxheik

最新推荐文章于 2025-01-03 20:31:35 发布

阅读量253

点赞数

分类专栏： memory barrier

memory barrier 专栏收录该内容

9 篇文章

订阅专栏

https://stackoverflow.com/questions/18576986/does-the-semantics-of-stdmemory-order-acquire-requires-processor-instruction

7down votefavorite

It is known that on x86 for the operations load() and store() memory barriers memory_order_consume, memory_order_acquire, memory_order_release, memory_order_acq_rel does not require a processor instructions for the cache and pipeline, and assembler's code always corresponds to std::memory_order_relaxed, and these restrictions are necessary only for the optimization of the compiler: http://www.stdthread.co.uk/forum/index.php?topic=72.0

And this code Disassembly code confirms this for store() (MSVS2012 x86_64):

std::atomic<int> a;
    a.store(0, std::memory_order_relaxed);
000000013F931A0D  mov         dword ptr [a],0  
    a.store(1, std::memory_order_release);
000000013F931A15  mov         dword ptr [a],1

But this code doesn't comfirm this for load() (MSVS2012 x86_64), using lock cmpxchg:

    int val = a.load(std::memory_order_acquire);
000000013F931A1D  prefetchw   [a]  
000000013F931A22  mov         eax,dword ptr [a]  
000000013F931A26  mov         edx,eax  
000000013F931A28  lock cmpxchg dword ptr [a],edx  
000000013F931A2E  jne         main+36h (013F931A26h)  

    std::cout << val << "\n";

But Anthony Williams said:

some_atomic.load(std::memory_order_acquire) does just drop through to a simple load instruction, and some_atomic.store(std::memory_order_release) drops through to a simple store instruction.

Where am I wrong, and does the semantics of std::memory_order_acquire requires processor instructions on x86/x86_64 lock cmpxchg or only a simple load instruction mov as said Anthony Williams?

ANSWER: It is the same as this bug report: http://connect.microsoft.com/VisualStudio/feedback/details/770885

c++ c++11 concurrency x86 memory-barriers

share improve this question

edited Sep 2 '13 at 17:19

asked Sep 2 '13 at 15:55

Alex

4,431650118

1

I'm not sure looking at what the compiler generates is necessarily a good way to determine the requirements of a particular functionality - it's not unheard of that compilers does "more than they need to". – Mats Petersson Sep 2 '13 at 16:04
@Mats Petersson Yes, but there is nothing easier than to do nothing. And that was required from the compiler, nothing but mov. Really the developers of Microsoft have failed with this the simplest task: "do nothing"? :) – Alex Sep 2 '13 at 16:14
I know MS VC (at least SOME versions) will generate extra "locking" on variables declared as volatile - not because the C++ standard requires it, but because some bits of code that USED to work on single core processors suddenly work poorly if you use SMP systems. This looks similar to one of those situations. – Mats Petersson Sep 2 '13 at 16:16
@Mats Petersson All right. But the volatile appeared a long time ago, when there was nothing known about the std::memory_order. And to avoid unnecessary calls to the WinAPI or assembler code, they decided to use the barriers(lock) for volatile - these three solutions are equally not beautiful. But now with the new standard C++11 all are clearly defined and there is one elegant solution - mov. Maybe for older x86 processors require to lock for load()? – Alex Sep 2 '13 at 16:26
1

Is it the same as this bug report? connect.microsoft.com/VisualStudio/feedback/details/770885 – jcoder Sep 2 '13 at 16:42

show 5 more comments

1 Answer

active oldest votes

up vote6down voteaccepted

No. The semantics of std::memory_order_acquire doesn't requires processor instructions on x86/x86_64.

Any load()/store() operations on x86_64 doesn't require processor instructions (lock/fence) exact atomic.store(val, std::memory_order_seq_cst); which requires (LOCK) XCHG or alternative: MOV (into memory),MFENCE.

Processor memory-barriers-instructions for x86(except CAS), and also ARM and PowerPC: http://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html

Disassembler GCC 4.8.1 x86_64 - GDB - load():

    20      temp = a.load(std::memory_order_relaxed);
    21      temp = a.load(std::memory_order_acquire);
    22      temp = a.load(std::memory_order_seq_cst);
0x46140b  <+0x007b>         mov    0x38(%rsp),%ebx
0x46140f  <+0x007f>         mov    0x34(%rsp),%esi
0x461413  <+0x0083>         mov    0x30(%rsp),%edx

Disassembler GCC 4.8.1 x86_64 - GDB - store():

a.store(temp, std::memory_order_relaxed);
a.store(temp, std::memory_order_release);
a.store(temp, std::memory_order_seq_cst);
0x4613dc  <+0x004c>         mov    %eax,0x20(%rsp)
0x4613e0  <+0x0050>         mov    0x38(%rsp),%eax
0x4613e4  <+0x0054>         mov    %eax,0x20(%rsp)
0x4613e8  <+0x0058>         mov    0x38(%rsp),%eax
0x4613ec  <+0x005c>         mov    %eax,0x20(%rsp)
0x4613f0  <+0x0060>         mfence
0x4613f3  <+0x0063>         mov    %ebx,0x20(%rsp)

Disassembler MSVS 2012 x86_64 - load() - it is the same as this bug report: http://connect.microsoft.com/VisualStudio/feedback/details/770885:

    temp = a.load(std::memory_order_relaxed);
000000013FE51A1F  prefetchw   [a]  
000000013FE51A24  mov         eax,dword ptr [a]  
000000013FE51A28  nop         dword ptr [rax+rax]  
000000013FE51A30  mov         ecx,eax  
000000013FE51A32  lock cmpxchg dword ptr [a],ecx  
000000013FE51A38  jne         main+40h (013FE51A30h)  
000000013FE51A3A  mov         dword ptr [temp],eax  
    temp = a.load(std::memory_order_acquire);
000000013FE51A3E  prefetchw   [a]  
000000013FE51A43  mov         eax,dword ptr [a]  
000000013FE51A47  nop         word ptr [rax+rax]  
000000013FE51A50  mov         ecx,eax  
000000013FE51A52  lock cmpxchg dword ptr [a],ecx  
000000013FE51A58  jne         main+60h (013FE51A50h)  
000000013FE51A5A  mov         dword ptr [temp],eax  
    temp = a.load(std::memory_order_seq_cst);
000000013FE51A5E  prefetchw   [a]  
    temp = a.load(std::memory_order_seq_cst);
000000013FE51A63  mov         eax,dword ptr [a]  
000000013FE51A67  nop         word ptr [rax+rax]  
000000013FE51A70  mov         ecx,eax  
000000013FE51A72  lock cmpxchg dword ptr [a],ecx  
000000013FE51A78  jne         main+80h (013FE51A70h)  
000000013FE51A7A  mov         dword ptr [temp],eax

Disassembler MSVS 2012 x86_64 - store():

    a.store(temp, std::memory_order_relaxed);
000000013F8C1A58  mov         eax,dword ptr [temp]  
000000013F8C1A5C  mov         dword ptr [a],eax  

    a.store(temp, std::memory_order_release);
000000013F8C1A60  mov         eax,dword ptr [temp]  
000000013F8C1A64  mov         dword ptr [a],eax  

    a.store(temp, std::memory_order_seq_cst);
000000013F8C1A68  mov         eax,dword ptr [temp]  
000000013F8C1A6C  xchg        eax,dword ptr [a]