In relational database design one of the key decisions is choosing the right primary key type for tables. In this post I am talking about surrogate or synthetic primary keys. They are called surrogate or synthetic as these keys are not derived from application data. In my experience I have seen very few teams giving a proper thought process to deciding primary key types. For each table they go with the default type that is used in their organization. This would mean all tables will either have a int/bigint type or uuid or varchar. In this post I am not giving any recommendation I am only discussing how different keys affect insertion speed, data size, and how they compare against each other. You will have to do your own analysis to choose the right data type based on your use case. Different tables have different needs so you should make judgement accordingly. There is no one size fit solution.
In this post, I am using Postgres 13.3. You can install it using your operating system package manager. I am running it in a Docker container.
To follow along you can create a sample database and run SQL queries in that.
create database choosing_pk_type;
Connect to this database
Primary Key Types Comparison Summary
The numbers shown above are derived from the below table.
I get this question a lot from our customers when we recommend technologies as part of the new software development proposal. Why not MySQL instead of Postgres? Why not Kotlin instead of Java? Why not Vue.js instead of React? Why not Memcache instead of Redis? Why not Dropwizard instead of Spring Boot? Why not Flutter instead of React Native? And the list goes on. The important point to note is that the alternative is not radically different from the proposed technology. Both options more or less have the same characteristics. There are successful products built using both the technologies. In this post I want to discuss how to answer why not X questions.
In this post I am not covering situations when X is very different from proposed technology. For example, why not Cassandra instead of Postgres? Or why not Aerospike instead of Redis? Or why not Rust instead of Java?
The obvious answer to why we propose certain technologies is that we have expertise in them and it is easier to find engineers in the market for those skills. But, when people ask this question they want to hear strong technical reasons not the usual boring obvious answer. So, the obvious answer does not fly.
I have started work on a new project where we are architecting and building a new digital retail bank from scratch. To architect a bank from scratch is a huge undertaking. There are many systems and partners involved. I have only started working in the banking domain from the last one year and I can surely say the banking domain is full of interesting technical challenges. Both incumbent and challenger banks are building systems using cloud native architecture and technologies. It is an interesting time to be working in the BFSI and FinTech space. Digital banks open up a possibility of making banking possible and accessible to millions of people across the world.
Shameless plug: I am hiring. If you are a software engineer or an architect looking for interesting technical work then contact me using the contact form.
DDD stands for Domain-driven design. This term was coined and popularized by Eric Evans when he wrote the seminal DDD blue book in 2003. I read the blue book 5 years back. It made me understand why it is important to have a good understanding of the domain when building complex software. At that time I couldn’t relate much with the strategic design patterns talked about in the DDD book. These strategic patterns felt very meta to me and I failed to apply them. A month back I read the Learning Domain Driven Design book by Vlad Khononov that helped me finally understand(at least I think so) DDD strategic patterns.
As I have gained experience building software I have realized that most important skill I can build is understanding existing code bases. You can learn about a new technology stack or framework much faster by reading existing code base that uses those technologies and trying to build an existing app in a step by step manner. Today, I wanted to learn how to build a web analytics service like Google Analytics. Google Analytics is the most widely used web analytics service on the web. I found a popular open source project umami.
Umami is a simple, fast, privacy-focused alternative to Google Analytics.
We will first start by understanding the project from outside in and then we will build the backend of umami in a step by step manner.
DBeaver for creating ER diagram and as database client – Link
I am unable to sleep because of fever so I thought let me write this post. Maybe it is not the best time to write this post but who cares. A couple of weeks back I got into discussion with a customer CTO over libraries over framework. This customer wants us to prefer libraries over framework.This was mainly with regard to Spring Boot vs Dropwizard discussion.
The best definition I have read on the web about framework and library is that a framework calls your code and your code calls the library API. A framework does much more and has strong opinions. Libraries are focussed, solve one problem, and swappable (not entirely true without proper abstractions).
The customer wanted us to use Dropwizard instead of Spring Boot because of the following reasons:
Spring Boot does too much auto-magic. With something like Dropwizard you can have control over how things bind together. You can use manual dependency injection instead of a IoC container doing that magic for you. You can disable auto configuration in Spring Boot. You can also wire beans by hand if you want. But, I agree the default way is to rely on auto-configuration.
Spring Boot vulnerability surface area is higher because it is too easy to add starter jars and bring all the transitive dependencies. I think this will be mostly true with any other approach of building software in Java unless 1) you are going down the stackless way(I don’t think Java platform is there yet) 2) you have good governance on what gets in your dependencies.
Spring Boot executable size is much higher. I compared bare bones Spring Boot(spring-boot-starter-web with Tomcat) and Dropwizard(default maven archetype) executable sizes. As it turns out Spring Boot was 17M and Dropwizard was 19M.
Spring Boot startup time is higher. This depends a lot on what you are doing at startup. The bare bone spring boot app starts in 1.64 seconds whereas the bare bone Dropwizard app took 1.526 seconds to startup.
Spring Boot consumes much more memory. This was true. Spring Boot loaded 7591 classes whereas Dropwizard loaded 6255 classes. Also, heap space consumption of Spring Boot was twice compared to Dropwizard.
Spring Boot apps are difficult to debug. I agree exception stacktraces are too long at times and it takes a minute or two to reach your calling code. But, I personally never had much trouble debugging Spring Boot apps. I mostly rely on good tests and logging to debug stuff.
Lastly, they wanted us to follow a general principle – prefer libraries over frameworks.
The funny part is Spring Boot does not call itself a framework and Dropwizard documentation states Dropwizard straddles the line between a library and a framework.
We went with Dropwizard :). I respect their decision and I think their reasons have merit. I myself have seen too many badly architected/built Spring Boot apps so I am open to trying out a new, simpler, and better alternative.
I have read the Brandon Smith post – Write Libraries, not Frameworks. I also think libraries over frameworks is a good architecture principle that we should strive for. The only problem is when we apply principles blindly without understanding the context.
I think principles like favouring libraries over frameworks are fundamental. For these principles to work you need to have the right context and provide the right environment. I think it will work for you when:
You have a good engineering team that understands the cost of adding libraries. There is no free lunch. It does not work in a typical bottom heavy pyramid team structure where teams are considered feature factories. They will add any library under the roof to deliver features if your mindset is not aligned with the principle.
You have good governance. It is enforced by healthy code reviews, automation (aka fitness functions), architecture knowledge sharing sessions, Microservices production readiness reviews, and architects with the skin in the game.
You spend effort and resources on developer experience to build tooling that makes it easy to scaffold new Microservices with your opinions and choices baked in. I am not sure how it will be any different from a pure framework approach. You will end up building your own Microservices framework with your library choices.
You train your software engineers to buy into this methodology. They will have to unlearn the existing way and learn the new way to build software.
You understand productivity might take a hit till developers understand the new way to build software. Frameworks give you a productivity boost by helping you get started faster and solving common problems for you.
You use automated checks to continuously prove that software is not deviating from the principle. You can write a build tool task that fails the build if executable size reaches a threshold. You can write tests that fail the build if people use certain libraries. All of this is possible. You need to invest the time of the right engineers to make this happen.
I don’t know whether this will work in our environment or context. It will depend on if we can walk the talk. I have seen too many times all the good things thrown under the bus when business puts pressure for features.
Looks like medicine(or writing this post) has done its job. I started writing at 2:28am and now at 3:39am my fever is down and I am feeling better.
Last few months I have spent a lot of time doing code reviews. During the code review exercise I also pair with developers to refactor and improve the quality of their pull requests (PR). I care about two things in code reviews – correctness and understandability. In this post I will not focus on correctness (I might write a future post on this). Today, I want to focus on most important aspect of making code easier to understand – good names. Most of the time that I spend in the code review is coming up with the intention revealing names for classes, methods, interfaces, variables, packages, modules, and Microservices. I find most developers (irrespective of experience level) struggle to come up with good names.
There are only two hard things in Computer Science: cache invalidation and naming things.
In this post I will list three reasons I think developers struggle to come up with good names.
All software systems we build use some sort of configuration files. These configuration files change depending on the environment in which your service/system is deployed. They allow us build a single deployable unit that can be deployed in multiple environments without any code change. We just change our configuration file depending on the environment and provide our service path to an external location where the configuration file exists. And, our service uses the configuration file to bootstrap itself.
Configuration files become unwieldy if not managed well. Incorrect configuration values is one of the major reasons for system downtime. Most teams don’t write tests for their configurations so a lot of times bugs are discovered in higher environments.
I was also seeing the same problem in one of my projects. There was a lack of clarity on which configuration properties change between different environments and which remains the same. Also, in a local and lower environment I don’t mind database credentials in my configuration files but for a higher environment I don’t want them to be present in the code.
Most of us are building systems that sit on top of other third party systems. This is common in the FinTech ecosystem that I am currently involved with. Most Neo-banks are built on modern core banking systems like Mambu, Thought Machine, etc. Core banking systems are not the only third party systems you need to build a modern bank. You also need a CRM, lending management system, payment switches, AML/fraud prevention system, Engagement platform, CMS, KYC, and a few others. Once you have selected all the ecosystem partners you have to do custom software development to build new and innovative customer journeys and integrate these systems into a working neo-bank. In this post, I will talk about important factors you should consider when architecting systems that are powered by third party systems.
The product I am building/architecting at work these days uses Monorepo for all our Microservices. Our Microservices are primarily built using Java 17 and Spring Boot 2.6.x. For frontend and platform code(Terraform, Helm charts, configuration files, etc) we have different Git repositories. We use Gradle 7.3 as our build tool. We also make use of shared libraries for code reuse. I know people suggest you should avoid using shared libraries in Microservices but as I discussed in an earlier post I think there are valid reasons to use shared libraries in Microservices.
I prefer Monorepo for three main reasons:
Better visibility and control.
Atomic code refactoring across Microservices. This is common in the initial phase of development.