Enforcing Logging best practices by writing ArchUnit tests

We have following three logging best practices:

  1. All loggers should be final variables. So, we prefer
   private final Logger logger = LoggerFactory
.getLogger(MyService.class); // good

Instead of

   private static final Logger LOGGER = LoggerFactory
.getLogger(MyService.class); // bad

Using constant based syntax makes code look ugly and require developers to use shift key for typing upper case variable name. This breaks the flow so we prefer to use field variable naming.

  1. All logs should have description and context So, we prefer
   logger.info("Event is alreay processed so not 
processing it again [eventId={}, eventDbId={}, 
eventType={}]", 
eventId, event.getId(), eventType); // good

instead of

   logger.info("Event is already processed 
so not processing"); // bad

We want logger statement to have enough context so that we can debug problems. The bad logging statement does not help you understand for which event this log statement was logged. All of these statements look similar.

  1. All error logs should have the exception in the context So, we prefer
   logger.error("Exception while processing event 
[eventId={}]", eventId, exception); // good

instead of

   logger.error("Exception while processing event 
[eventId={}]", eventId); // bad

To help developers discover these before raising their pull requests we have written ArchUnit tests for enforcing these practices. So, local build fails in case they violate these best practices. You can read my earlier post on ArchUnit[1] in case you are new to it.

Continue reading “Enforcing Logging best practices by writing ArchUnit tests”

Why an architecture with three Availability Zones is better than the one with two Availability Zones?

A couple of months back a customer asked why we are proposing a three Availability Zone (AZ in short) architecture instead of two. Their main point was which failure modes 3 AZs guard against that 2 AZs can’t do. We gave the following two reasons: 

  • We proposed 3 AZs for improved availability. Also, since services and instances will be deployed across 3 AZs then if one AZ goes down then with 3 AZs you lose 1/3 capacity. With two AZs you can lose half the capacity.
  • If there are services(like you want run your own Cassandra or something) where we need to manage quorum it is better to have three

They were not very convinced so we agreed to start with the two AZs solution.

Continue reading “Why an architecture with three Availability Zones is better than the one with two Availability Zones?”

The case for frameworks over libraries (Spring Boot vs Dropwizard)

I am working with a customer where customer took the decision to go with Dropwizard instead of Spring Boot(or Spring ecosystem). I initially respected their decision and decided to give Dropwizard a fair chance. Now after spending a couple of months building a system that uses Dropwizard I don’t recommend it. I wrote about my initial thoughts here.

There are three main advantages of a battle tested and highly used framework like Spring:

  • Good APIs
  • Solution to common problems
  • Good searchable documentation

Let me help you understand that by taking an example of integrating Kakfa in a Dropwizard application. The Dropwizard official organization provides a Kafka bundle[1] so we decided to use it for adding Kafka support. I found following issues with it:

Poor API: When you create the KafkaConsumerBundle in the *Application class you are forced to provide an implementation of ConsumerRebalanceListener. KafkaConsumerBundle does not do anything with it but it forces you to provide it [2]. If you read Kafka documentation you need to provide ConsumerRebalanceListener not at the time of creation but when you subscribe it. ConsumerRebalanceListener is used to commit the offsets in case of partitions. There is also an open issue [3] in Github repo on the same without any answer from the maintainers.

Incomplete configuration: The Dropwizard Kafka bundle does not support all the Kafka producer and consumer configuration properties. For example, it is often recommended to set number of retries in producer to Integer.MAX_VALUE and rely on delivery.timeout.ms to limit the number of retries. Since it is not a configuration option in Dropwizard bundle you have to hardcode it during the bundle creation.

Missing solution to common problems: Any real world Kafka application need to solve for these three common problems. Spring Kafka part of Spring ecosystem provides solution to these problems.

  1. It does not provide serializer/deserialzer for JSON or any other format. You have to write one yourself or find a library that implements it.
  2. Handling of Poison pill messages using ErrorHandlingDeserializer
  3. Publishing of Poison pill messages to a dead letter topic

Conclusion

Yes, you can write your own bundle which fixes all these issues. But, then you are doing the undifferentiated work. Your team can spend time on writing business logic rather than writing and maintain this code. There is no right or wrong answer here. There is a cost that has to be paid when you take these decisions. You should keep that in mind. There is no free lunch.

Resources

  1. https://github.com/dropwizard/dropwizard-kafka
  2. https://github.com/dropwizard/dropwizard-kafka/blob/master/src/main/java/io/dropwizard/kafka/KafkaConsumerBundle.java#L33
  3. https://github.com/dropwizard/dropwizard-kafka/issues/179

How does FerretDB work?

In recent weeks, I have come across FerretDB on multiple occasions, and I thought why not just get a closer look on the topic. I took a particular interest in it (FerretDB) as it is a MongoDB implementation, on top of my favourite database PostgreSQL.

While I do have high-level thoughts on how I would go about building MongoDB on top of Postgres, I wanted to confirm, validate, and learn how the FerretDB team has been doing it.

FerretDB (previously MangoDB) was founded to become the de-facto open-source substitute to MongoDB. FerretDB is an open-source proxy, converting the MongoDB 5.0+ wire protocol queries to SQL – using PostgreSQL as a database engine.

At a high level, FerretDB is a proxy, which implements MongoDB Wire Protocol that MongoDB clients speak. After establishing the connection with MongoDB clients, it translates any query sent by MongoDB clients to the SQL queries Postgres understands.

In the recent release(0.5.0) of FerretDB, it is also possible to use it as a library rather than as a proxy. FerretDB as a library helps in reducing one network hop, which leads to better performance. It is only possible for applications that are built in Go since FerretDB is implemented in Go.

Below are some of the tweets from people on this article. If you find this article useful please share and tag me @shekhargulati

Continue reading “How does FerretDB work?”

My Notes on GitLab Postgres Schema Design

I spent some time going over the Postgres schema of Gitlab. GitLab is an alternative to Github. You can self host GitLab since it is an open source DevOps platform.

My motivation to understand the schema of a big project like Gitlab was to compare it against schemas I am designing and learn some best practices from their schema definition. I can surely say I learnt a lot.

I am aware that best practices are sometimes context dependent so you should not apply them blindly.

The Gitlab schema file structure.sql [1] is more than 34000 lines of code. Gitlab is a monolithic Ruby on Rails application. The popular way to manage schema migration is using the schema.rb file. The reason the Gitlab team decided to adopt structure.sql instead is mentioned in on of their issues [2] in their issue tracker.

Now what keeps us from using those features is the use of schema.rb. This can only contain standard migrations (using the Rails DSL), which aim to keep the schema file database system neutral and abstract away from specific SQL. This in turn means we are not able to use extended PostgreSQL features that are reflected in schema. Some examples include triggers, postgres partitioning, materialized views and many other great features.

In order to leverage those features, we should consider using a plain SQL schema file (structure.sql) instead of a ruby/rails standard schema schema.rb.

The change would entail switching config.active_record.schema_format = :sql and regenerate the schema in SQL. Possibly, some build steps would have to be adjusted, too.

Now, let’s go over the things I learnt from Gitlab Postgres schema.

I am building a course on how to build production apps using LLMs. We will cover topics like prompt engineering, RAG, search, testing and evals, fine tuning, feedback analysis, and agents. You can register now and get 50% discount. Register using form – https://forms.gle/twuVNs9SeHzMt8q68

Below are some of the tweets from people on this article. If you find this article useful please share and tag me @shekhargulati

Continue reading “My Notes on GitLab Postgres Schema Design”

Improve Git Monorepo Performance

Today, I was exploring source code of the Gitlab project and experienced poor performance of the git status command. Gitlab is an open source alternative to Github.

Below is the output of git status command

 time git status
On branch master
Your branch is ahead of 'origin/master' by 1 commit.
  (use "git push" to publish your local commits)

nothing to commit, working tree clean
git status  0.20s user 1.13s system 88% cpu 1.502 total

The total here is the number of seconds it took for the command to complete.

The same was the case for the git add command.

time git add .
git add .  0.21s user 1.11s system 115% cpu 1.146 total

So both commands took more than a second to finish.

These commands are slow because they need to search the entire worktree looking for changes. When the worktree is very large, Git needs to do a lot of work.

Continue reading “Improve Git Monorepo Performance”

Why I like gRPC?

I have started using gRPC for service to service communication between Microservices and I am liking it so far. 

I still prefer to expose APIs to the external world(browser, mobile, or  third-party) using either REST or GraphQL. I am aware that you can use gRPC in Mobile apps and you can use grpc-web in web frontends. But, I have not used gRPC for those use cases yet.

I have earlier used REST(JSON over Http) and/or some form of Event-driven communication for service to service communication. They both work but the programming model leaves much to be desired.

gRPC is a HTTP/2 based modern and efficient inter-process communication style developed by Google. It is heavily used at Google and many other major tech companies such as Square, Lyft, Netflix, CockroachLabs, Salesforce, and many others.

As shown in the picture below gRPC builds on top of HTTP/2 and SSL as the efficient and secure transport layer. It uses Protocol Buffer for defining API contracts and efficient serialization. gRPC core provides the framework to do efficient service to service communication. gRPC tooling generates clients and servers that are used by the application tier.

Now that we understand gRPC basics I will share my reasons to prefer gRPC for service to service communication.

Continue reading “Why I like gRPC?”

Choosing Primary Key Type in Postgres

In relational database design one of the key decisions is choosing the right primary key type for tables. In this post I am talking about surrogate or synthetic primary keys. They are called surrogate or synthetic as these keys are not derived from application data. In my experience I have seen very few teams giving a proper thought process to deciding primary key types. For each table they go with the default type that is used in their organization. This would mean all tables will either have a int/bigint type or uuid or varchar. In this post I am not giving any recommendation I am only discussing how different keys affect insertion speed, data size, and how they compare against each other. You will have to do your own analysis to choose the right data type based on your use case. Different tables have different needs so you should make judgement accordingly. There is no one size fit solution.

In this post, I am using Postgres 13.3. You can install it using your operating system package manager. I am running it in a Docker container.

To follow along you can create a sample database and run SQL queries in that.

create database choosing_pk_type;

Connect to this database

\c choosing_pk_type;

Primary Key Types Comparison Summary

The numbers shown above are derived from the below table.

Now the TLDR version.

Continue reading “Choosing Primary Key Type in Postgres”

Why not X?

I get this question a lot from our customers when we recommend technologies as part of the new software development proposal. Why not MySQL instead of Postgres? Why not Kotlin instead of Java? Why not Vue.js instead of React? Why not Memcache instead of Redis? Why not Dropwizard instead of Spring Boot? Why not Flutter instead of React Native? And the list goes on. The important point to note is that the alternative is not radically different from the proposed technology. Both options more or less have the same characteristics. There are successful products built using both the technologies. In this post I want to discuss how to answer why not X questions. 

In this post I am not covering situations when X is very different from proposed technology. For example, why not Cassandra instead of Postgres? Or why not Aerospike instead of Redis? Or why not Rust instead of Java?

The obvious answer to why we propose certain technologies is that we have expertise in them and it is easier to find engineers in the market for those skills. But, when people ask this question they want to hear strong technical reasons not the usual boring obvious answer. So, the obvious answer does not fly. 

Continue reading “Why not X?”

Applying DDD To Architect a Digital Bank Part 1: Analyzing domain using Subdomains

I have started work on a new project where we are architecting and building a new digital retail bank from scratch. To architect a bank from scratch is a huge undertaking. There are many systems and partners involved. I have only started working in the banking domain from the last one year and I can surely say the banking domain is full of interesting technical challenges. Both incumbent and challenger banks are building systems using cloud native architecture and technologies. It is an interesting time to be working in the BFSI and FinTech space. Digital banks open up a possibility of making banking possible and accessible to millions of people across the world.

Shameless plug: I am hiring. If you are a software engineer or an architect looking for interesting technical work then contact me using the contact form.

DDD stands for Domain-driven design. This term was coined and popularized by Eric Evans when he wrote the seminal DDD blue book[1] in 2003. I read the blue book 7 years back. It made me understand why it is important to have a good understanding of the domain when building complex software.

Continue reading “Applying DDD To Architect a Digital Bank Part 1: Analyzing domain using Subdomains”