Two-phase commit protocol

Welcome to the fourth post in the distributed systems series. In the last post, we covered ACID transactions. ACID transactions guarantee

  1. Atomicity: Either all the operation succeed or none
  2. Consistency: System moves from one consistent state to another at the successful completion of a transaction
  3. Isolation: Concurrent transactions do not interfere with each other
  4. Durability: After successful completion of a transaction all changes made by the transaction persist even in the case of a system failure

If the database is running on a single machine then it is comparatively easier to guarantee ACID semantics in comparison to a distributed database. Following are the reasons you would want to run a database in a distributed fashion:

  1. To be fault-tolerant
  2. To handle more reads and writes

Let’s assume our’s is a read intensive application and our single machine database is not able to scale to our demand. One of the solution to scale read is achieved through replication. The most common replication topology is single master and multiple slaves. All the writes go to the master and reads are performed on slaves. Data from master is replicated to the salves synchronously or asynchronously. In this post, we will assume synchronous replication.

Read More »

The Minimalistic Guide to ACID Transactions

Welcome to the third post of distributed system series. So far in this series, we have looked at service discovery and CAP theorem. Before we move along in our distributed system learning journey, I thought it will be useful to refresh our memory with understanding of ACID transactions. ACID transactions are at the heart of relational databases. The knowledge of ACID transactions is useful when building distributed applications.

Understanding ACID transactions

A transaction is a sequence of operations that form a single logical unit of work. These transactions are executed on a shared database system to perform a higher-level function. An example of higher-level function is transferring money from one account to another. Transactions represent a basic unit of change in the database. It either executed in its entirety or not at all.

ACID (Atomicity, Consistency, Isolation, and Durability) refers to a set of properties that a database transaction should guarantee even in the event of errors, power failure, etc. The canonical example of ACID transaction is transfer of funds from one bank account to another. In a single fund transferring transaction, you have to check the account balance, debit one account, and credit another transaction. ACID properties guarantee that either money transfer from one account to other occur correctly and permanently or in case of failure both accounts have the same initial state. It would be unacceptable if one account was debited but the other account was credited.

Database transactions are motivated by two independent requirements:

  1. Concurrent database access: Multiple clients can access the system at the same time. This is achieved by the Isolation property of ACID transaction.
  2. Resiliency to system failures: System remains in consistent state in case of a system failure. This is provided by Atomicity, Consistency, and Durability properties of ACID transaction.

Read More »

CAP Theorem for Application Developers

Most of us are building distributed systems. This is a fact. According to Wikipedia, a distributed system is a system whose components are located on different networked computers, which then communicate and coordinate their actions by passing messages to each other. A distributed system could either be a standard three-tier web application or it could be a massive multiplayer online game.

The goal of a distributed system is to solve a problem that can’t be solved on a single machine. A single machine can’t provide enough compute or storage resources required to solve the problem. The user of a distributed system perceives the collection of autonomous machines as a single unit.

The distributed systems are complex as there are several moving parts. You can scale out components to finish the workloads in a reasonable time. Because of numerous moving parts and their different scaling needs it becomes difficult to reason out the characteristics of a distributed applications. CAP theorem can help us.

Read More »

Service Discovery for Modern Distributed Applications

I am starting a new blog series (with no end date) from today. In this series, I will pick a topic and go in-depth so that I don’t just scratch the surface of the topic. The goal is to build a habit of learning each week and share it with the community. For the next few months, I will write on different aspects of building distributed systems. Each Wednesday, you may expect a new post.

I am sure we all have built applications where one application uses another application to do its job. Most of the time, applications communicate with each other using HTTP REST API but it can be other communication mechanisms like gRPC, Thrift, Message Queues as well. For example, if you are building an application that needs Twitter service for fetching tweets. To call Twitter API, you will need the API URL and access keys to make a successful API call. Most often we rely on static configuration either in the form of a configuration file or environment variable to get the API URL. This approach works fine when you are working with third party APIs like Twitter as their API URLs do not change often. The static configuration approach fails when we build a Microservices architecture based application. The definition of Microservices that I like is by Martin Fowler as described in his blog,

Microservice architecture style is an approach to developing a single application as a suite of small services, each running its own process and communicating with lightweight mechanisms, often an HTTP resource API.

Read More »