distributed-systems – Shekhar Gulati

Key Insights From Amazon Builder Library

For the last couple of weeks I have been going over articles and videos in the Amazon Builder library. They cover useful patterns that Amazon uses to build and operate software. Below are the important points I captured while going over the material.

Reliability, constant work, and a good cup of coffee – Link

Amazon systems strive to solve problems using reliable constant work patterns. These work patterns have three key features:
- One, they don’t scale up or slow down with load or stress.
- Two, they don’t have modes, which means they do the same operations in all conditions.
- Three, if they have any variation, it’s to do less work in times of stress so they can perform better when you need them most.
There are not many problems that can be efficiently designed using constant work patterns.
- For example, If you’re running a large website that requires 100 web servers at peak, you could choose to always run 100 web servers. This certainly reduces a source of variance in the system, and is in the spirit of the constant work design pattern, but it’s also wasteful. For web servers, scaling elastically can be a better fit because the savings are large. It’s not unusual to require half as many web servers off peak time as during the peak.
Based on the examples given in the post it seems that a constant work pattern is suitable for use cases where system reliability, stability, and self-healing are primary concerns. It is fine if the system does some wasteful work and costs more. These are essential concerns for systems which others use to build their systems on. I think control plane systems fall under this category. The example of such a system mentioned in the post is a system that applies configuration changes to foundational AWS components like AWS Network load balancer. The solution can be designed using both the push and pull based approach. The pull based constant work pattern approach lends to a simpler and reliable design.
Although not mentioned in the post, constant work that the system is doing should be idempotent in nature.

Paper Summary: Monarch: Google Planet-Scale In-memory Time Series Database

This week I read Monarch paper by Google engineers. The paper covers in detail design decisions involved in building Monarch. Monarch as the title of the paper suggests is an in-memory time series database. It is used by Google monitoring system that monitors most of the Google web properties like Gmail, Youtube, and Google Maps.

Every second, the system ingests terabytes of time series data into memory and serves millions of queries.

These are some very big numbers. Most of us do not have to deal with such large volume of data in our day to day work. Reading this paper can help us understand how engineers building such system make design decisions and tradeoffs.

Microsoft’s Distributed Application Runtime (Dapr)

We are living in a world where every other day we see a new technical innovation. For most of us mere mortal it is not easy to quickly figure out if the technology is worth our time. Making sense of new technical innovations is one of the core aspects of my current role so by taking the pain of writing this series I will help myself as well.

In this post, I will cover Dapr. Dapr stands for distributed application runtime.

What does distributed application runtime means?

These days most of us are developing distributed systems. If you are building an application that uses Microservices architecture then you are building a distributed system.

A distributed system is a collection of autonomous computing elements that appears to its users as a single coherent system.

Distributed application runtime provides you the facilities that you can use to build distributed systems. These facilities include:

State management. Most applications need to talk to a datastore to store state. Common examples like PostgreSQL, Redis, etc.
Pub/sub. For communication between different components and services.
Service to service communication. This also includes retries, circuit breaking.
Observability. To bring visibility into systems.
Secret management. For storing password and keys.
Many other

One important thing to note is that these are application level concerns. Distributed application runtime does not concern itself with infrastructure or network level concerns.

Paper Summary: Lessons from Giant-Scale Services

Today, I read a paper titled Lessons from Giant-Scale Services. This paper is written by Eric Brewer, the guy behind CAP theorem. It is an old paper published in 2001. The paper helps the reader build mental model on how to think about availability of large scale distributed systems.

Continue reading “Paper Summary: Lessons from Giant-Scale Services”

MemSQL Introduction: A Hybrid transactional/analytical processing database

main-how-it-works-ecosystem-diagram

MemSQL is s fast, commercial, ANSI SQL compliant, highly scalable HTAP database. HTAP databases are those that support both OLTP and OLAP workloads. It supports ACID transactions just like a regular relational database .It also supports document and geospatial data types.

MemSQL is fast because it stores data in-memory. But, it does not mean it is not durable. It maintains a copy of data on disk as well. Transactions are committed to the transaction log on disk and later compressed into full-database snapshots. One of the main reason new databases are designed as in-memory first is because memory is getting cheaper every year. It is estimated memory is becoming cheaper 40% every year.

MemSQL has tuneable durability. You can make it fully durable or completely ephemeral. It can be sync or async.

MemSQL simplifies your architecture as you don’t have to write ETL jobs to move data from one data store to another data store. This is the biggest selling point of any HTAP database.

Continue reading “MemSQL Introduction: A Hybrid transactional/analytical processing database”

System Design: Design the Amazon Recently Viewed Items Page API

I enjoy working through system design problems. It helps me think how I will design interesting features of various systems. I will post design solutions to interesting problems.

Today, I will share how I will design Amazon recently viewed items page. You can view this page by going to https://www.amazon.com/gp/history/

To me it showed last 73 items I viewed on Amazon.com. I don’t think they are showing last N items rather they are showing items that I viewed in last X days(or months) with some max limit.

Let’s redefine problem now that we better understand it.

Design the Amazon recently viewed items page API. The recently viewed items are all the items that you viewed in the last 6 months. The max count of items could be 100.

Continue reading “System Design: Design the Amazon Recently Viewed Items Page API”

Introducing Chaos Engineering to an Organization

This post explains my learning on how to introduce Chaos Engineering to an organisation. This is based on my experience of re-architecting monolithic application to Microservices based architecture. Microservices architecture style structures an application as a collection of loosely-coupled services. Microservices architecture has many benefits like independent development and deployments of services, eliminate long-term commitment to a technology stack, specialized services built by small teams, and many others. One of the drawbacks of Microservices is that it increases the surface area of failures. You now have to deal with failures related to the interaction between services and system boundaries. Our client was facing issues running their distributed application in a steady state. The issues that we faced were:

Communication failure between services. There was no clear strategy on how to handle network failure between services and how to give proper feedback to the customers of the application.
Difficulty in understanding why the whole application became unavailable when only a single service was down. Is there any single point of failure? These types of issues were not visible with usual testing.
System becoming partially unavailable when the network gets choked.
Unwanted local state leading to system unavailability when one instance of the service dies.
Out of memory errors in production services leading to complete or partial unavailability of the system.
Possible data loss issues as data replication and backup strategies were never tested in real workloads.

Continue reading “Introducing Chaos Engineering to an Organization”

A minimalistic guide to distributed tracing with OpenTracing and Jaeger

If you have ever worked on a distributed application you will know that it is difficult to debug when things go wrong. The two common tools to figure out root cause of the problem are logging and metrics. But the fact of the matter is that logs and metrics fail to give us complete picture of a situation in a distributed system. They fail to tell us the complete story.

If you are building a system using Microservices / Serverless architecture then you are building a distributed system.

Logs fail to give us the complete picture of a request because they are scattered across a number log files and it is difficult to link them together to form a shared context. Metrics can tell you that your service is having high response time but it will not be able to help you easily identify the root cause.

Logging and Metrics are not enough to build observable systems.

Observability is a measure of how well internal states of a system can be inferred from knowledge of its external outputs. It helps bring visibility into systems. – Wikipedia

Logs, metrics, and traces are the three pillars of observability. While most software teams use logging and monitoring few of them use traces. Before we look at distributed tracing in depth, let’s define logs, metrics, and traces.

Continue reading “A minimalistic guide to distributed tracing with OpenTracing and Jaeger”

A simple introduction to distributed systems

A couple of weeks back a junior developer asked me a seemingly simple question – What is a distributed system? One question led to another and we end up spending more than an hour discussing different aspects of distributed systems. I felt my knowledge on distributed systems was rusty and I was unable to explain concepts in a simple and clear manner.

In the last two weeks since our discussion I spent time reading distributed systems literature to gain better understanding of the basics. In a series of post starting today, I will cover distributed system basics. In today’s post we will cover what and why of distributed systems.

Continue reading “A simple introduction to distributed systems”

Z-axis scaling

Last year I was building an application that had to process million of records. The processing of each record was independent but complicated. To process all the records it was taking more time than the response time we had to meet as per SLA. We employed near cache and were processing all the data in memory. This made memory utilisation of the app high.

At that time I employed the strategy to shard the data so that each application instance can process a subset of the data. This helped us improve cache utilisation and reduce memory and I/O usage of the application. These days each time scalability is mentioned Microservices is thrown as the solution. We as software developers need to keep in mind three dimensions of scalability so that we can choose the best possible strategy for the problem. These were first mentioned in the book The Art of Scalability. I read this book in 2012 when I was working with OpenShift platform as a service.

As per the book The Art of Scalability there are three dimensions of scalability as shown below

scalecube

X-axis scaling

This is the traditional way of scaling monolithic applications. We run multiple copies of an application behind a load balancer.

Y-axis scaling

In this model, we break the application vertically into multiple independent services. This is what Microservices architecture allows us to achieve.

Z-axis scaling

In Z-axis scaling each server is responsible for processing subset of the data. This is the solution that I applied for my problem.

In case you want to read more about Scale Cube I suggest you read this post.