maintainability – Shekhar Gulati

A distributed system is one in which the failure of a computer you didn’t even know existed can render your own computer unusable. – Leslie Lamport

Last six months I was building pricing engine for a client. The application was built using multiple components:

We had a change data capture pipeline built using AWS Kinesis that read data from IBM DB2 and writes to PostgreSQL to keep database in sync with changes happening in the source system
We were storing denormalised documents in AWS ElastiCache i.e. Redis
We had a batch job that was doing one time load of the PostgreSQL database
We had a near cache that helped us process our worst requests in few hundred milliseconds

When you build a system using multiple independent components then you have to keep in mind that you are building a data system that it needs to provide certain guarantees. In our case, we had to guarantee:

AWS ElastCache i.e Redis will be updated with changes happening in the source system in less than 30 seconds
Near-cache will be invalidated and updated with latest data so that clients accessing the system will get consistent results. Keeping a global cache like Redis is easier to keep in sync than keeping near-cache in sync. We came up with a novel way to keep near cache in sync with the global cache.
Data will not be lost by our change data capture pipeline. If processing of a message failed then we retry the message
There will be times when different data components i.e. PostgreSQL, Redis, and near-cache will have different state. But, eventually it should become consistent
That there will be a mechanism to observe state of the system at any point of time

Like it or not systems that we are building are becoming more and more distributed. This means there are many more ways they can fail. To help build software systems that meets the end goal, we should keep following three concerns in our mind. These should be defined as clearly as possible so that every team member keep these in mind while building software systems.

Reliability
Scalability
Maintainability

Continue reading “Thinking about software system in terms of reliability, scalability, and maintainability”