Welcome to the fourth post in the distributed systems series. In the last post, we covered ACID transactions. ACID transactions guarantee
- Atomicity: Either all the operation succeed or none
- Consistency: System moves from one consistent state to another at the successful completion of a transaction
- Isolation: Concurrent transactions do not interfere with each other
- Durability: After successful completion of a transaction all changes made by the transaction persist even in the case of a system failure
If the database is running on a single machine then it is comparatively easier to guarantee ACID semantics in comparison to a distributed database. Following are the reasons you would want to run a database in a distributed fashion:
- To be fault-tolerant
- To handle more reads and writes
Let’s assume our’s is a read intensive application and our single machine database is not able to scale to our demand. One of the solution to scale read is achieved through replication. The most common replication topology is single master and multiple slaves. All the writes go to the master and reads are performed on slaves. Data from master is replicated to the salves synchronously or asynchronously. In this post, we will assume synchronous replication.
How will you guarantee atomicity in the master slave replication system? To replicate data from master to slave in an atomic manner is a consensus problem. All the nodes have to come to a common consensus on whether they should all abort a transaction or all successfully commit the transaction.
The consensus problem requires agreement among a number of processes for a single data value. Some of the processes may fail or be unreliable in other ways, so consensus protocols must be fault tolerant or resilient. A consensus protocol must satisfy following properties:
- Termination: Every correct process decides some value
- Integrity: Every correct process must decide at most one value, and if it decides some value, then it must have been proposed by some process
- Validity: If a process decides a value V, then V must have been proposed by some correct process
- Agreement: Every correct process must agree on same value
One way to solve consensus problem is achieved with the Two-phase commit protocol (or 2PC in short). It is the most widely used distributed consensus protocol. Most relational databases use it to implement replication.
In the Two-phase commit protocol, there are two phases (that’s the reason it is called Two-phase commit protocol) involved. The first phase is called prepare. In this phase, a coordinator (either a separate node or the node initiating the transaction) makes a request to all the participating nodes, asking them whether they are able to commit the transaction. They either return yes or no in the response. They return yes if they can successfully commit the transaction or no if they unable to do so.
In the second phase, coordinator decides based on the votes whether to send the commit or abort request to the participating nodes. This phase is called the commit phase.
- If all the participating nodes said yes, then commit request is sent to all the nodes
- If any of the participating nodes said no, then abort request is sent to all the nodes
The above is depicted in the image below.
The two-phase protocol is similar to setting up a meeting between different individuals when all of them required in a meeting. The meeting facilitator asks all the participants if a a certain time works for them, if all of them said yes then the meeting is setup for that time else a new time is proposed and process continues.
Let’s understand how two-phase commit protocol guarantee atomicity. We will look at each of the steps performed in 2PC.
- Each transaction is given a globally unique identifier.
- A prepare request is sent to all the participants by the coordinator. These requests contain the global transaction id. If any of the request fails or timeout, the coordinator sends an abort request.
- If the prepare request is successfully received by the participating nodes then participating nodes make sure that they can commit transaction under all situations. They write all the transaction related data to disk and check if none of the constraints are violated. The participant node return yes to the coordinator if everything is fine. By returning yes response, the participating node makes a promise that it will commit a transaction no matter what. It is point of no return for the participating node.
- Once coordinator has received all the responses to prepare requests, it makes a decision on whether to commit or abort a transaction based on the received votes. The decision is persisted to a transaction log on the disk. This is done to ensure coordinator knows what it did with the transaction in case it crashes.
- Next, coordinator sends the commit or abort request to all the participating nodes. Coordinator will enforce the decision by retrying the request if some of the requests fail. If the participating node died before committing the transaction then the transaction will be committed after it has recovered.
How coordinator failure is handled?
Let’s discus scenarios when coordinator can fail
- If coordinator failed before sending the prepare request. If coordinator fails before sending the prepare request then nothing happens. Participating nodes can safely ignore.
- If coordinator failed after finishing the prepare phase but before sending the commit request to all the participating nodes, then participating nodes must wait for coordinator to recover. The participating nodes must only commit the transaction when they receive commit request from the coordinator.
- If coordinator failed in between the commit phase i.e. it sent request to some of the participating nodes and then failed. In this case also, participating nodes will have to wait to hear from the coordinator.
Two-Phase commit and CAP
Two-Phase commit is CA (consistent and available). It does not handle network partition. When network partition happens then the request just block and wait for network to recover. This is the reason many people say relational database are CA.
Drawback of Two-phase commit
- 2PC is a blocking protocol. A single node failure blocks the protocol progress until the node has recovered. If the coordinator fails then participants will resolve their transactions only when coordinator recovers.
- 2PC latency depends on the slowest node. 2PC is a N-of-N write approach so it ensures writes is written to all the nodes.