A simple introduction to distributed systems

A couple of weeks back a junior developer asked me a seemingly simple question – What is a distributed system? One question led to another and we end up spending more than an hour discussing different aspects of distributed systems. I felt my knowledge on distributed systems was rusty and I was unable to explain concepts in a simple and clear manner.

In the last two weeks since our discussion I spent time reading distributed systems literature to gain better understanding of the basics. In a series of post starting today, I will cover distributed system basics. In today’s post we will cover what and why of distributed systems.

Read More »

Issue #22: 10 Reads, A Handcrafted Weekly Newsletter For Humans

The time to read this newsletter is 150 minutes.

Curiosity is, in great and generous minds, the first passion and the last – Samuel Johnson

  1. Monorepos: Please don’t!: 20 mins read. In this post, Matt Klein gives reasons to why monorepo approach does not provide benefits most often cited by monorepo proponents. His recommendation is to go with polyrepo structure. The post makes four valid arguments:
    1. Organizations that use monorepo spend considerable engineering resources on building tools to work with monorepos. Most organisations don’t have such luxury.
    2. Monorepo makes it difficult to open source internal projects as you have single commit history
    3. Most VCS are not meant to be used for large monolithic repositories. There is some work done by Microsoft as part of its git VFS project but it has some rough edges.
    4. The last interesting point that post makes is The frank reality is that, at scale, how well an organization does with code sharing, collaboration, tight coupling, etc. is a direct result of engineering culture and leadership, and has nothing to do with whether a monorepo or a polyrepo is used.

    The main drawback of polyrepo approach is that it creates a culture where different teams own different parts of the code. There are few people in organization aware of the big picture. This point is beautifully put by Adam Jacob in his post – Monorepo: please do!.

    My take on this is somewhere in between. For example, if you are building a web application then I like to keep all backend Microservices in one repository and front-end application in another repo. I think the best is somewhere in between the both approaches. Taking either too far does not work.
    Read More »

Z-axis scaling

Last year I was building an application that had to process million of records. The processing of each record was independent but complicated. To process all the records it was taking more time than the response time we had to meet as per SLA. We employed near cache and were processing all the data in memory. This made memory utilisation of the app high.

At that time I employed the strategy to shard the data so that each application instance can process a subset of the data. This helped us improve cache utilisation and reduce memory and I/O usage of the application. These days each time scalability is mentioned Microservices is thrown as the solution. We as software developers need to keep in mind three dimensions of scalability so that we can choose the best possible strategy for the problem. These were first mentioned in the book The Art of Scalability. I read this book in 2012 when I was working with OpenShift platform as a service.

As per the book The Art of Scalability there are three dimensions of scalability as shown below

scalecube

X-axis scaling

This is the traditional way of scaling monolithic applications. We run multiple copies of an application behind a load balancer.

Y-axis scaling

In this model, we break the application vertically into multiple independent services. This is what Microservices architecture allows us to achieve.

Z-axis scaling

In Z-axis scaling each server is responsible for processing subset of the data. This is the solution that I applied for my problem.

In case you want to read more about Scale Cube I suggest you read this post.

How Docker uses cgroups to set resource limits?

Today, I was interested to know how does Docker uses cgroups to set resource limits. In this short post, I will share with you what I learnt.

I will assume that you have a machine on which Docker is installed.

Docker allows you to pass resource limits using the command-line options. Let’s assume that you want to limit the IO read rate to 1mb per second for a container. You can start a new container with the device-read-bps option as shown below

$ docker run -it --device-read-bps /dev/sda:1mb centos

In the above command, we are instantiating a new centos container. We specified device-read-bps option to limit the read rate to 1mb per second for /dev/sda device.

Read More »