All software systems we build use some sort of configuration files. These configuration files change depending on the environment in which your service/system is deployed. They allow us build a single deployable unit that can be deployed in multiple environments without any code change. We just change our configuration file depending on the environment and provide our service path to an external location where the configuration file exists. And, our service uses the configuration file to bootstrap itself.
Configuration files become unwieldy if not managed well. Incorrect configuration values is one of the major reasons for system downtime. Most teams don’t write tests for their configurations so a lot of times bugs are discovered in higher environments.
I was also seeing the same problem in one of my projects. There was a lack of clarity on which configuration properties change between different environments and which remains the same. Also, in a local and lower environment I don’t mind database credentials in my configuration files but for a higher environment I don’t want them to be present in the code.
Most of us are building systems that sit on top of other third party systems. This is common in the FinTech ecosystem that I am currently involved with. Most Neo-banks are built on modern core banking systems like Mambu, Thought Machine, etc. Core banking systems are not the only third party systems you need to build a modern bank. You also need a CRM, lending management system, payment switches, AML/fraud prevention system, Engagement platform, CMS, KYC, and a few others. Once you have selected all the ecosystem partners you have to do custom software development to build new and innovative customer journeys and integrate these systems into a working neo-bank. In this post, I will talk about important factors you should consider when architecting systems that are powered by third party systems.
The product I am building/architecting at work these days uses Monorepo for all our Microservices. Our Microservices are primarily built using Java 17 and Spring Boot 2.6.x. For frontend and platform code(Terraform, Helm charts, configuration files, etc) we have different Git repositories. We use Gradle 7.3 as our build tool. We also make use of shared libraries for code reuse. I know people suggest you should avoid using shared libraries in Microservices but as I discussed in an earlier post I think there are valid reasons to use shared libraries in Microservices.
I prefer Monorepo for three main reasons:
Better visibility and control.
Atomic code refactoring across Microservices. This is common in the initial phase of development.
Integration is the real thing. It is when rubber meets the road. Your teams can do their work in isolation but it has limited value until it is integrated with your application. The whole DevOps(CI/CD) movement is about integration. You want to integrate often and deliver value faster to the customer. You can’t deliver value if you don’t integrate.
It is sad that in 2022 still we are struggling to integrate our work. We have all the tools and processes that promote integration yet I work with so many teams struggling to integrate their work.
In this post I will cover things that are uncovered when you integrate. I am not giving any advice on how to integrate. The only advice on how part is that you have to make it your top priority and do it. The sooner you do the better.
Let’s come back to the main topic of this post. Following is a list of things that I see happen when teams integrate their work.
Here are 10 posts I thought were worth sharing this week.
#1. How eBPF will solve Service Mesh – Goodbye Sidecars – Link
Service Mesh has evolved over the years. We started from a library based approach, then moved on to the sidecar containers, and finally service mesh capabilities will become part of Linux via eBPF. We use Istio at work. I was aware that there is some overhead of Istio as you have a proxy with each workload. As per this post, sidecar container based service meshes add 3-4 times of latency. This is a huge cost. For a 500 node cluster where each node runs 30 pods this adds up to 1TB of memory used by all sidecar proxies. This assumes each proxy adds 70MB of overhead. eBPF is a technology to watch. It looks like it is the technology that will make service mesh efficient and performant. It is still in early days so we will have to wait before it becomes mainstream.
#2. Your Teams Don’t Need To Set Their Own Goals – Link
I have also seen this working. Most of the time teams don’t know and understand how they should go about achieving a large goal. Then, you as a leader have to break a large goal down to small, manageable goals that the team can aim to achieve. I prefer goals to be realistic. They don’t have to be easy but they don’t have to be too difficult as well. You have to build the team’s confidence and that is built when they achieve realistic goals. It requires a leader to have clarity and they should be good at decomposing problems.
These days I am working on building a next generation mobile banking platform. One of the solutions that I was designing this week was around how to handle configuration masters in Microservices. I am not talking about Microservices configuration properties here. I have not seen much written about this in the context of Microservices . So, I thought let me document the solution that I am going forward with. But, before we do that let’s define what these are configuration masters.
In my terminology configuration masters are those entities of the system that are static yet configurable in nature. Examples of these include IFSC codes for banks, error messages, bank and their icons, account types, status types, etc. In a reasonably big application like mobile banking there will be anywhere between 50-100 configuration master entities. These configuration master entities have three characteristics:
They don’t change often. This means they can be cached
They don’t change often but you still want the flexibility to update existing items or add new items if required. Typically, they are modified either using database scripts or by exposing APIs that some form of admin portal(used by IT operations people) uses to add new entries or modify existing entry
The number of rows per configuration master entity is not more than 1000. This make them suitable for local in-memory caching
I am building a central notification dispatch system that is responsible for sending different kinds of notifications to the end customer. It relies on multiple third party APIs for sending the actual email/SMS notifications. At a high level architecture of the system is shown below.
NotificationSender exposes both REST and messaging interface for accepting consumer requests. Consumers here refer to the services that need to send the notification. This is what notification system does:
It accepts requests from upstream services and stores that in the Postgres database after doing validation. The notification event is written to the Postgres database in ENQUEUED state. It is returns back HTTP 202 ACCEPTED to the upstream services if the request is valid else it returns HTTP 400 Bad Request.
At a predefined frequency a poller that is part of the NotificationDispacther polls the Postgres database for new notification events i.e. events in ENQUEUED state. For now, it respects insertion time order.
If enqueued events are found then it processes them and sends actual notifications using the downstream SMS and Email services.
After processing the events it change state of the events to processed
A couple of months back I watched a video by Andy Pavlo, Associate Professor of Databases Carnegie Mellon, where he made a point that databases should not use mmap. He went on to say that if there is only one thing you should get from his database course is to never use mmap when building and designing database management systems. I have not used mmap before so I was intrigued to understand it in more detail. I was aware that MongoDB used to use an mmap based storage engine. It allowed them to achieve faster time to market but later they had to replace it with a new storage engine wiredtiger because of the issues they faced with mmap. MongoDB is not the only database that uses mmap. There are many databases that use mmap. Some of the databases that use mmap are RavenDB, ElasticSearch, LevelDB, InfluxDB, LMDB, BoltDB, moss (key-value store from Couchbase), etc.
Given that so many databases use mmap I wanted to understand why Andy recommended us to not use mmap. I will list all of the reasons I could find in my research and from Andy’s video in this post. But, before we do that let’s first understand mmap.
Here are 10 posts I thought were worth sharing this week.
#1. What we learnt by migrating from CircleCI to Buildkite – Link
This post covers how and why Hasura switched their CI service from CircleCI to Buildkite. They started by defining the requirements from their CI, then they evaluated different solutions, and finally introduced it in their ecosystem. Their main reason to switch CI service was cost. They reduced the cost by 50%. This required them to own some of the aspects of the CI operations. A couple of interesting things I learnt from this post:
Use of labels to trigger build. They used them to save costs.
The use of dynamic configuration. They wrote their build code in a Go program. This saved them from YAML hell. Interestingly, they use shellcheck t static analysis of shell script
It is all about perspective. Tech Debt brings negative emotions in people and it becomes difficult to sell it to higher management. In this post, the author suggests we reframe tech debt as tech wealth while communicating with stakeholders. Building tech wealth means getting more value out of the software we’re creating, as well as our efforts to develop and maintain it. Author suggests two ways we can plan for tech wealth:
Allocate time within each planning cycle
Dedicate the last few cycles in a quarter
In one of the products I worked on we used to schedule 1 day per sprint (2 weeks) for paying tech debt. We had sprint demo every alternate Thursday and the next day i.e. Friday was scheduled for working on tech debt items. One problem with 1 day every sprint is that bigger items can’t be handled. We used to create stories for them and pick them as part of the sprint backlog.