https://exactly-once.github.io/posts/notes-on-2pc/
If there’s a distributed protocol every software engineer knows it’s Two-Phase Commit also know as 2PC. Although, in use for several decades1, it’s been in steady decline mainly due to lack of support in cloud environments. 如果有一个分布式协议,每个软件工程师都知道它是两阶段提交,也称为2PC。尽管使用了几十年1,但它一直在稳步下降,主要是由于缺乏对云环境的支持。
For quite some time it was a de-facto standard for building enterprise distributed systems. That said, with the cloud becoming the default deployment model, designers need to learn how to build reliable systems without it. 在相当长的一段时间里,它实际上是构建企业分布式系统的标准。也就是说,随着云成为默认的部署模型,设计者需要学习如何在没有云的情况下构建可靠的系统。
Answering the question of how 2PC can be replaced requires an understanding of what it was, that the protocol provided in the first place. In spite of its popularity, there are plenty of misconceptions around 2PC. This post aims to clarify at least some of these. 回答如何替换2PC的问题需要了解协议最初提供的内容。尽管2PC很受欢迎,但人们对它有很多误解。这篇文章的目的是至少澄清其中的一些。
NOTE: This is not “yet another introduction to 2PC”™. If you need a refresher read one of many descriptions out there before continuing.注:这不是“2PC的又一次介绍”™. 如果你需要复习,请在继续之前阅读其中的描述。
2PC doesn’t provide “transactions”
2PC is an atomic commit protocol meaning all participants will eventually commit if all voted “YES” or leave the system unchanged otherwise. When a commit operation triggered by the user finishes, either all local modifications have been applied or none of them has. The commit can take arbitrarily long to complete and in some failure scenarios, it will hang forever. 2PC是一个原子提交协议,意味着如果所有参与者都投了“是”票,那么所有参与者最终都将提交,否则系统将保持不变。当用户触发的提交操作完成时,要么所有本地修改都已应用,要么没有应用。提交可能需要任意长的时间才能完成,在某些失败场景中,它将永远挂起。
Let’s look at an example to see what we mean by “no transactions”. In our scenario, we have two participants: a database and a messages queue. The diagram shows 2PC execution after both participants voted “YES” and the coordinator is committing.
让我们看一个例子,看看我们所说的“无交易”是什么意思。在我们的场景中,我们有两个参与者:一个数据库和一个消息队列。该图显示了在两个参与者都投票“是”并且协调器正在提交之后2PC的执行。
2PC atomic visibility
Our example assumes that the queue transaction commits first, however, 2PC says nothing about the order in which the participants commit. It is nondeterministic and can change for the same set of participants on each execution. 我们的示例假设队列事务首先提交,但是2PC没有说明参与者提交的顺序。它是不确定的,并且在每次执行时对于同一组参与者可能会发生变化。
What’s most interesting is the outside observer i.e. the client. It makes a read requests to both participants. The read request to the message queue arrives after the commit from the coordinator. This means that the read operation returns messages written to the queue in the transaction that just committed. 最有趣的是外部观察者,即客户。它向两个参与者发出读取请求。对消息队列的读取请求在来自协调器的提交之后到达。这意味着读取操作将返回刚刚提交的事务中写入队列的消息。
In the case of the database, the read request arrives before the commit. What will be the result here? 2PC says nothing about this behavior - it’s outside of the system model defined by the protocol. The read behavior isn’t defined by the protocol but rather the deployment configuration. 对于数据库,读取请求在提交之前到达。结果会怎样?2PC对这种行为只字未提——它超出了协议定义的系统模型。读取行为不是由协议定义的,而是由部署配置定义的。
There are at least two possible behaviors. The read operation can: 至少有两种可能的行为。读取操作可以:
- Block until the local transaction is committed - this will happen when local transaction operates in Serializable isolation level. This is the default configuration for Microsoft Distributed Transaction Coordinator2, and Microsoft SQL Server, but can be changed on a per-transaction basis, 在提交本地事务之前阻止-当本地事务在Serializable隔离级别中操作时,会发生这种情况。这是Microsoft分布式事务协调器2和Microsoft SQL Server的默认配置,但可以根据每个事务进行更改,
- Return the last committed value (different from the one written by the local transaction) - this will happen when local transaction operates with Snapshot isolation. 返回最后一个提交的值(与本地事务写入的值不同)-当本地事务使用快照隔离操作时,会发生这种情况。
In summary, 2PC does not provide atomic visibility of writes in a system when there are transactions committed with 2PC and other local transactions running at the level of each participant. The exact behavior isn’t defined by 2PC but depends on the concrete implementation of the protocol, resources involved, as well as deployment and runtime configuration. 总之,当存在与2PC提交的事务以及在每个参与者级别运行的其他本地事务时,2PC不提供系统中写入的原子可见性。确切的行为不是由2PC定义的,而是取决于协议的具体实现、所涉及的资源以及部署和运行时配置。
2PC can be high available
Any non-trivial protocol defines failure conditions that it’s able to tolerate and 2PC is no exception. What is specific to 2PC is that some types of failures can make participants get “stuck”. Whenever a participant votes “YES” it’s unable to make any progress until hearing back from the coordinator. 任何非平凡的协议都定义了它能够容忍的故障条件,2PC也不例外。2PC的具体特点是,某些类型的失败会让参与者“陷入困境”。每当参与者投票“是”时,在收到协调员的回复之前,无法取得任何进展。
What might be the reasons for a participant getting stuck? First, the failure of the coordinator. Secondly, network partitioning between the coordinator and the participant3. The likelihood of getting stuck is conditioned by the coordinator’s availability and the probability of network failure. By making the failures less likely we can make 2PC more available. 参与者陷入困境的原因可能是什么?首先,协调人的失败。其次,协调器和参与者之间的网络划分3。陷入困境的可能性取决于协调器的可用性和网络故障的概率。通过减少故障的可能性,我们可以使2PC更加可用。
Participant in the ‘stuck’ state
This touches on the implementation and configuration aspect already mentioned. For example in the MSDTC, the coordinator is a single process but can be deployed in a fail-over cluster mode. That is a deployment decision. There is also nothing in 2PC that prevents the coordinator from being implemented as a quorum of processes4. 这涉及到已经提到的实现和配置方面。例如,在MSDTC中,协调器是单个进程,但可以在故障转移集群模式下部署。这是一个部署决定。2PC中也没有任何东西可以阻止协调器被实现为进程的定额4。
Finally, if all the parties (the coordinator and all the participants) are running in the same local network, on a single cluster or inside a single VM, then what is the probability of network partitioning? 最后,如果所有各方(协调器和所有参与者)都在同一个本地网络中、在单个集群上或在单个VM内运行,那么网络分区的概率是多少?
As always, context is king. 一如既往,背景为王。
Commit latency is not the biggest problem
Committing in 2PC requires 2 round trips between the coordinator and each participant, and there are 4n messages generated, where n is the number of participants. This is sometimes viewed as the root cause of many practical problems with the protocol. It isn’t ideal but only surfaces other, bigger problem. 在2PC中提交需要协调人和每个参与者之间进行2次往返,生成4n条消息,其中n是参与者的数量。这有时被视为协议中许多实际问题的根本原因。这并不理想,但只能解决其他更大的问题。
The problem is potential contention at the participant level caused by locking, especially when dealing with relational databases. Holding locks means that other transactions dealing with a given piece of state need to wait for the transaction to commit to make any progress. 问题是锁定在参与者级别引起的潜在争用,尤其是在处理关系数据库时。持有锁意味着处理给定状态的其他事务需要等待事务提交才能取得任何进展。
This situation exists without 2PC but the protocol makes is pretty much always worst as in 2PC the time the locks are held is defined by the slowest participant. 这种情况在没有2PC的情况下存在,但协议几乎总是最糟糕的,因为在2PC中,锁定的时间是由最慢的参与者定义的。
2PC fits the cloud quite well
It is well known that 2PC is used by the cloud vendors inside their services4 and can be used by the users when running at the level of IaaS. That said, none of the cloud vendors support MSDTC and/or XA at the level of native cloud services i.e. native service can’t participate in 2PC. 众所周知,2PC由云供应商在其服务中使用4,用户在IaaS级别运行时可以使用。也就是说,没有一家云供应商在本地云服务级别支持MSDTC和/或XA,即本地服务不能参与2PC。
Often, availability and performance are claimed to be the reasons for that. Although these two are not the strongest points of 2PC, it can be argued that security (or lack of it) is even more important. 通常,可用性和性能被认为是造成这种情况的原因。尽管这两个不是2PC的最强点,但可以说安全性(或缺乏安全性)更为重要。
2PC assumes a high degree of trust between the participants and the coordinator. One could imagine an evil user operating a specially crafted coordinator to exhausts the participants’ resources by purposefully letting transactions hang in the “stuck state”. 2PC假定参与者和协调人之间具有高度的信任。可以想象,一个邪恶的用户操作一个特制的协调器,故意让事务处于“停滞状态”,从而耗尽参与者的资源。
From the cloud vendor perspective that could have quite a damaging consequences. According to the protocol participant is not allowed to make any progress after voting “YES”. So in case of malicious coordinator, they would have to break the protocol or allow their resources to be blocked. 从云供应商的角度来看,这可能会产生相当大的破坏性后果。根据协议,参与者在投票“是”后不得取得任何进展。因此,在恶意协调器的情况下,他们将不得不破坏协议或允许他们的资源被阻止。
Even if the cloud vendors provided their coordinators as the only valid option, a malicious participant could still cause a lot of harm. Enabling cloud services to act as 2PC participants is effectively opening doors for a Denial of Service (DoS) attack56. 即使云供应商提供了他们的协调员作为唯一有效的选择,恶意参与者仍然可能造成很大的伤害。让云服务充当2PC参与者实际上为拒绝服务(DoS)攻击打开了大门56。
2PC is not the only commit protocol
2PC is just one possible solution to atomic commit. It works well in certain scenarios but performs poorly when used in an environment that violates its assumptions. 2PC只是原子提交的一种可能的解决方案。它在某些场景中运行良好,但在违反其假设的环境中使用时表现不佳。
In fact, there are very few assumptions that 2PC makes about the participants. Putting more constraints around transaction determinism allows for alternative approaches that minimize the lock holding time 7. 事实上,2PC对参与者的假设很少。围绕事务决定论设置更多的约束,允许使用其他方法来最小化锁保持时间7。
When we acknowledge the lack of atomic visibility and work with participants that guarantee commit success by their very nature (like message queues) it’s possible to end up with a commit protocol that requires a single sequential write to each participant8. 当我们承认缺乏原子可见性,并与那些从本质上保证提交成功的参与者(如消息队列)合作时,最终可能会得到一个提交协议,该协议需要对每个参与者进行一次顺序写入8。
Summary
Hopefully, this post puts a bit more light on 2PC and what is it that we get from the protocol. Although the era of 2PC is coming to an end, it’s good to know what guarantees we need to provide by other means in the systems we build. 希望这篇文章能对2PC以及我们从协议中得到的东西有更多的了解。尽管2PC时代即将结束,但很高兴知道我们在构建的系统中需要通过其他方式提供什么保障。
- Transaction Management in the R* Distributed Database Management System – Mohan et al. 1986 [return]
- an implementation of 2PC built into Windows [return]
- these faults need to happen during the voting phase. There are extensions to the protocol like Cooperative Termination Protocol (CTP) that try to mitigate the “stuck state” problem but don’t eliminate them in a general case. [return]
- ”(…) Running two-phase commit over Paxos mitigates the availability problems.” in Spanner: Google’s Globally-Distributed Database [return]
- the only case of 2PC in the cloud at the PaaS level we know of is Elastic Transactions in Azure SQL DB. That said, the coordinator is part of the database instances and Azure SQL DBs are the only allowed participants. [return]
- ”(…) Ultimately, MSDTC is a single-node/cluster and local-network technology, which also manifests in its security model that is fairly difficult to adapt to a multitenant cloud system. (…)” by Clemens Vasters in Distributed Transactions and Virtualization [return]
- It’s Time to Move on from two Phase Commit by Daniel Abadi [return]
- Outbox pattern is a commit protocol implementation that works on two participants and assumes that writing to the message queue is idempotent and will always succeed [return]