Last year I was building an application that had to process million of records. The processing of each record was independent but complicated. To process all the records it was taking more time than the response time we had to meet as per SLA. We employed near cache and were processing all the data in memory. This made memory utilisation of the app high.
At that time I employed the strategy to shard the data so that each application instance can process a subset of the data. This helped us improve cache utilisation and reduce memory and I/O usage of the application. These days each time scalability is mentioned Microservices is thrown as the solution. We as software developers need to keep in mind three dimensions of scalability so that we can choose the best possible strategy for the problem. These were first mentioned in the book The Art of Scalability. I read this book in 2012 when I was working with OpenShift platform as a service.
As per the book The Art of Scalability there are three dimensions of scalability as shown below
This is the traditional way of scaling monolithic applications. We run multiple copies of an application behind a load balancer.
In this model, we break the application vertically into multiple independent services. This is what Microservices architecture allows us to achieve.
In Z-axis scaling each server is responsible for processing subset of the data. This is the solution that I applied for my problem.
In case you want to read more about Scale Cube I suggest you read this post.
A distributed system is one in which the failure of a computer you didn’t even know existed can render your own computer unusable. – Leslie Lamport
Last six months I was building pricing engine for a client. The application was built using multiple components:
- We had a change data capture pipeline built using AWS Kinesis that read data from IBM DB2 and writes to PostgreSQL to keep database in sync with changes happening in the source system
- We were storing denormalised documents in AWS ElastiCache i.e. Redis
- We had a batch job that was doing one time load of the PostgreSQL database
- We had a near cache that helped us process our worst requests in few hundred milliseconds
When you build a system using multiple independent components then you have to keep in mind that you are building a data system that it needs to provide certain guarantees. In our case, we had to guarantee:
- AWS ElastCache i.e Redis will be updated with changes happening in the source system in less than 30 seconds
- Near-cache will be invalidated and updated with latest data so that clients accessing the system will get consistent results. Keeping a global cache like Redis is easier to keep in sync than keeping near-cache in sync. We came up with a novel way to keep near cache in sync with the global cache.
- Data will not be lost by our change data capture pipeline. If processing of a message failed then we retry the message
- There will be times when different data components i.e. PostgreSQL, Redis, and near-cache will have different state. But, eventually it should become consistent
- That there will be a mechanism to observe state of the system at any point of time
Like it or not systems that we are building are becoming more and more distributed. This means there are many more ways they can fail. To help build software systems that meets the end goal, we should keep following three concerns in our mind. These should be defined as clearly as possible so that every team member keep these in mind while building software systems.
Continue reading “Thinking about software system in terms of reliability, scalability, and maintainability”