My Thoughts On Monorepo

Before we start let me give some context on my background so that you can better understand my thoughts on Monorepo.

I head technology at an IT services organization. Most of the products that I build are using Microservices architecture, have multiple frontends(web and mobile). The biggest product that I recently built had close to 30 microservices, 1 web client written in React, and native mobile app built using React Native. These numbers are nowhere near the numbers big product companies have shared.

I prefer Macroservices over Microservices. I think most products don’t need more than 10 microservices.

The reason I am clearly specifying I belong to the IT services world is because most of the stuff we consume on software development is written by engineers and senior tech people at the product companies. The stuff they write and share is based on the real problems and challenges they face at work. There are times when those problems resonate with problems other software engineers face at their work but there are times they are solutions to the problems we don’t have. So, we have to look at these solutions from the lens of our problems.

The post is based on my experience building software, leading and managing software delivery teams, and learning from the great articles written by engineers using Monorepos. Please refer to the references section for good resources on monorepos.

Let’s get back to the topic at hand.

So, what is a Monorepo?

A monorepo is a software development strategy where a single version control repository has source code for multiple projects, libraries, and applications irrespective of their programming language. Also, the organizations using Monorepo strategy often use a common build tool (like Bazel, Pants, Buck) to manage all the source code. Some of the popular examples of organizations that employ monorepo strategy are Google, Facebook, Twitter, Microsoft, and Uber.

The alternative to monorepo is polyrepo/multirepo. In multirepo, you have a separate version control repository for each component. This is the common strategy used by most organizations to structure their code. This in my view has been largely driven by Microservices architecture style and small modules movement.

As mentioned in the paper[1] (Advantages and Disadvantages of a Monolithic Repository – A case study at Google), Monorepos have following properties:

Centralization: The codebase is contained in a single repo encompassing multiple projects.
Visibility: Code is viewable and searchable by all engineers in the organization.
Synchronization: The development process is trunk-based; engineers commit to the head of the repo.
Completeness: Any project in the repo can be built only from dependencies also checked into the repo. Dependencies are unversioned; projects must use whatever version of their dependency is at the repo head.
Standardization: A shared set of tooling governs how engineers interact with the code, including building, testing, browsing, and reviewing code.

My understanding is that to successfully use monorepo you will have to satisfy all the properties. Otherwise, you will not get benefits intended from monorepo.

Advantages of Monorepos

There are valid reasons why many big product organizations prefer Monorepo. Following are the main reasons:

Reason 1: Simplified dependency management

Monorepos make dependency management simple by:

You can easily depend on other projects/modules in a monorepo without the need for artifact management tools like Nexus, Artifactory etc.
You avoid diamond dependency problem. Diamond dependences occur when a project has two dependencies which depend on the same underlying library. When a developer upgrades a dependency, they run the risk of breaking a diamond in the dependency graph.
It is easier to keep all dependencies on the same version by using a centralized way to manage version numbers.

This is simplified further by using a single build tool. I have not used Bazel, Bucks, or Pants. I was watching a talk on Twitter monorepo journey where they talked about Gradle being too slow for their use case. For the size of applications I have built Gradle has worked just fine.

Reason 2: Code sharing and reusability

The second big benefit of Monorepo is that developers can share code across projects. It is easier to enforce best practices across the code base by using monorepo. Another related point is that with monorepo we don’t end up creating silos. This is important in an enterprise setup because it leads to passing the buck and bugs falling through the cracks of the boundaries. In my experience with multirepo setup people only care about their Microservice running fine. They miss the point that value is achieved by integrating the software and collaboration. In IT service organizations where there is more bureaucracy and uneven distribution of skilled developers the problem scales very quickly with multirepo setup. Yes, I know it is a culture problem but most IT service organizations can’t burn investor dollars to build the culture.

Reason 3: Atomic changes

This I didn’t realize before I read literature on Monorepo. There is a lot of benefit in seeing related changes in a single commit. If you are working on a story that requires changes in multiple components then in a multirepo scenario you will have to see changes in multiple repositories and merge the PRs in some sequence so that you are in a healthy state. WIth monorepo you save the pain of trying to coordinate commits across multiple repositories. Also, this leads to better code reviews as all the changes are in one place.

Reason 4: Large-scale code refactoring

This is related to reason 3. With a monorepo, you can refactor the API and all of its callers in one commit. You see all the usages of an API at a single place and it is much easier to do than with multirepo where you might not even have all the code checked out. In my experience with multirepo setup most developers don’t keep all the repos updated with the upstream changes. Monorepos enables continuous improvement on global level that multirepo you do at local level.

Reason 5: Less bureaucracy

With some organizations I have worked at, you have to create ServiceNow tickets to create a repository. It can take a couple of days before you get your empty repository. With monorepo you don’t have to go through this pain.

Disadvantages of Monorepo

Nothing comes for free. There are always trade offs involved. Your job as a software engineer is to figure out if advantages weigh more than trade offs or not.

In my view following are the downsides of monorepo:

Monorepos could slow down developers because of slow build times, poor tooling, and merge conflicts.
1. Most developers still in 2020 struggle to cleanly merge code.
2. Git is slow for projects with large numbers of files and history.
There is cognitive overhead involved as developers have to get comfortable with a much larger code base than they would have with multirepo setup.
To do monorepo well require investment in tooling that most organization non-tech leadership will fail to understand

So, what’s my view on monorepos?

Before I talk about my views on Monorepo let’s understand three main constraints of IT services organization.

We work with multiple customers so we can’t keep code of all customers in the same repository even when we host their code in our version control for obvious reasons. Also, we can’t give access to all our repositories to all our developers because of security and IP related issues. So, we will keep our discussion focused on how to manage repos for a single customer.
IT services organizations have a high ratio of junior(< 5 years) to senior engineers(> 10 years) somewhere in the range of 10:1 to 100:1 or may be higher in bigger IT service organizations. The reason I am bringing this point is that monorepos requires discipline and it is tough to achieve without senior engineers driving it using a well-defined process.
People come and go at a faster rate.

Given the above two constraints and the disadvantages of monorepos it might seem that monorepos will not work for us. But, I see real problems faced by software delivery teams that can be solved by monorepos.

We build products for different customers. These products usually follow Microservice architecture, have multiple frontends – web and mobile, functional tests, scripts for deployment automation. In the multirepo strategy, you will create at minimum 5 repositories – 1 for backend with all microservices, 1 for SPA frontend, 1 or 2 repo for mobile depending on whether you are building pure native or using some native framework like React Native or Flutter, 1 for functional tests, 1 for deployment automation scripts. More often than not your team will use one repo per Microservice then only god knows how many repos you end up creating.

Let me tell you a real story. I was once working with a client that had more than 1000 repositories in their version control system. They were using the Gitlab version control platform. They had 5 products and each product was made up of multiple Microservices. The first question I asked them was to help us understand which all services and their respective code repositories were part of product A. Their chief architect had to spent a day figuring out all the repositories that made the product A. After spend a day still she was not sure if she has covered all the services.

Let’s discuss problems that I face with software delivery teams using the multirepo strategy. Just to reiterate these problems are in the context of a single customer.

Lack of accountability: Humans are good at creating boundaries and silos. They don’t care what happens outside those boundaries. They don’t care about the bigger picture.
Version drift. 10 different versions of Spring Boot, three different JDK versions, multiple versions of React, and god knows how many different versions of libraries.
Repository sprawl. Everything ends up becoming a repository.
Health of projects. Using the same tools for all projects. Code consistency. Single source of truth
1. Architecture patterns
2. Coding style
3. Testing practices
When you use multirepo everything ends up become a separate repository. And soon you lose the sight on how many repos you have.

Can monorepo work for us?

Yes. For a single customer we don’t have to scale to millions of lines of code and 1000’s of developers. For a single customer we are under a million lines of code and our delivery teams are well under 100. We don’t have large version control histories and our developers commit less than 1000 commits in a week. So, we are well below the numbers shared by Google and Facebook.

This means existing tools like Git and Maven/Gradle work fine for us.

We are already doing that for one of our big customers. Below is the mono starter repo we use for new products.

References

Advantages and Disadvantages of a Monolithic Repository – Link
Monorepos: Please don’t! – Link
Why Google Stores Billions of Lines of Code in a Single Repository – Link
Advantages of Monorepo – Link

6 thoughts on “My Thoughts On Monorepo”

cliffordberg says:

November 3, 2020 at 2:19 am

To me, the main disadvantage of a monorepo is that is treats the entire system as a single component. I want to be able to version individual components.

It is also important to recognize that dependencies are not always explicit or detectable through functional tests. In a multi-component system, in which each component executes in a separate runtime (e.g., a microservice), the only way to verify that things work together is through real time integration testing. A monorepo does not address that level of behavioral (real time) integration, yet that level of integration is where most issues are today.

1. Rajat says:
  
  November 3, 2020 at 4:22 pm
  
  I find monorepo more complex in terms of managing deployments, versioning, configuration and resolving dependencies.
  
  1. The code is open for sharing and for modification as well.
  2. Single git version for all the components is a nightmare for versioning the artifacts.
  3. Monorepo force the CI CD pipeline to deploy all components. That is actually exploitation of immutable architecture.
  4. Using maven for creating reactor builds can solve pt. 2 & 3 but that will make pipeline code quite complex and hard to implement.
  
  1. shekhargulati says:
    
    November 3, 2020 at 5:49 pm
    
    Hi Rajat,
    
    My replies:
    
    > The code is open for sharing and for modification as well.
    In my view when code is accessible to all people there are more chances that people will care about the health of the code base and you will have more people who understand the codebase. With multi-repo, it is out of sight is out of mind. This leads to the cross-pollination of ideas. In some ways, this is true DevOps as well. We have to break the barrier and silos that teams form. It is scary in a typical enterprise setup but I think it is a step in the right direction.
    
    > Single git version for all the components is a nightmare for versioning the artifacts.
    Artifact versioning can be different from the Git version. They don’t have to be the same.
    
    > Monorepo force the CI CD pipeline to deploy all components. That is actually the exploitation of immutable architecture.
    This is not true. You can determine which modules/services have changed by `git diff`. Example https://github.com/infracloudio/app-mono/blob/master/detect-changed-services.sh
    
    > Using maven for creating reactor builds can solve pt. 2 & 3 but that will make pipeline code quite complex and hard to implement.
    Yes, it will be a bit involved but doable. In my view, it is ok to pay this cost to avoid people’s issues related to accountability.
prashantdathwal says:

November 4, 2020 at 1:18 am

You can always define a micro service registry at organization level if the short term advantage is to get meta information. In my opinion, even monorepos need to define boundary. This boundary can be identified at domain level. So, you can have multiple monorepos (definitely not running in thousands) each containing microservices based on problem domain your software is solving.

1. shekhargulati says:
  
  November 4, 2020 at 11:06 am
  
  Hello Prashant,
  
  Great to hear from you after a long time….
  
  Yes, with either of the strategy you need to define correct boundaries. One thing I have realized is that it is difficult to get bounded context right the first time. Overtime after multiple levels of iterations you might get the boundary right. But in the early versions you always get that wrong.
  
  Yes, service registry can help to some extent. But, I have not seen it being used a lot in the real world. Also, it only solves part of the problem.
  
Dmitry Linov says:

November 17, 2020 at 8:03 pm

What do you think of Git X-Modules (gitmodules.com) as an alternative to monorepo? This is a way to combine multiple repositories in one and keep them separated at the same time – avoids the disadvantages of Git Submodules and monorepo approaches. The idea is that you create a mirror sync between a folder in your project repository and any other repository that is related to it (e.g. a shared library). So if you have many projects, you don’t have to put them all in one monorepo, even if they share some code.