Today, I was exploring source code of the Gitlab project and experienced poor performance of the git status
command. Gitlab is an open source alternative to Github.
Below is the output of git status
command
time git status
On branch master
Your branch is ahead of 'origin/master' by 1 commit.
(use "git push" to publish your local commits)
nothing to commit, working tree clean
git status 0.20s user 1.13s system 88% cpu 1.502 total
The total
here is the number of seconds it took for the command to complete.
The same was the case for the git add
command.
time git add .
git add . 0.21s user 1.11s system 115% cpu 1.146 total
So both commands took more than a second to finish.
These commands are slow because they need to search the entire worktree looking for changes. When the worktree is very large, Git needs to do a lot of work.
To give some context as of 3rd July 2022 Gitlab source code has 44147 files. This is equal to 3920240 lines of code and 2.18 GB in size. I used a tool called tokei to calculate the number of files and lines of code. Below is the trimmed down tokei output.
===============================================================================
Language Files Lines Code Comments Blanks
===============================================================================
BASH 10 331 217 53 61
Clojure 1 3 3 0 0
CSS 2 380 265 10 105
Dockerfile 20 352 183 74 95
Go 218 26761 20792 1149 4820
GraphQL 786 13442 12711 382 349
JavaScript 6758 687087 565968 17702 103417
JSON 880 265745 265691 0 54
Makefile 2 206 158 15 33
Pan 1 15 11 1 3
PowerShell 1 13 5 5 3
Python 1 47 32 7 8
Rakefile 116 6945 5174 509 1262
Ruby 26637 2250147 1696058 86570 467519
Ruby HTML 168 2321 1865 41 415
Sass 272 55825 46004 1459 8362
Shell 36 2343 1742 158 443
SQL 5 61611 50019 22 11570
SVG 206 1362 1327 8 27
Plain Text 51 22962 0 17866 5096
XML 15 7327 6314 4 1009
YAML 4003 134445 130221 2402 1822
// removed for brevity
===============================================================================
Total 44147 3920240 2835859 379671 704710
===============================================================================
I am aware that Git does not scale well for large Git mono repositories.
A few years back I remember reading a post by the Microsoft team where they explained how they have built a virtual file system to improve performance of Git. From the 2017 post
As a refresher, the Windows code base is approximately 3.5M files and, when checked into a Git repo, results in a repo of about 300GB. Further, the Windows team is about 4,000 engineers and the engineering system produces 1,760 daily “lab builds” across 440 branches in addition to thousands of pull request validation builds. All 3 of the dimensions (file count, repo size and activity), independently, provide daunting scaling challenges and taken together they make it unbelievably challenging to create a great experience.
I happen to read post published by Github team where they explained how you can improve the performance of large monorepos by using a newly released feature in Git called Git file system monitor (FSMonitor). This feature is available in Git version 2.37.0. On Mac, you can run brew install git
to get the latest version.
You can enable FSMonitor by running the following command.
git config core.fsmonitor true
Github post suggests to also enable an untracked cache feature so we will do that as well.
git config core.untrackedcache true
The first time you will run the git status
command after running the above commands it will be equally slow. This is because daemon needs to synchronize with the state of the index.
time git status
On branch master
Your branch is ahead of 'origin/master' by 1 commit.
(use "git push" to publish your local commits)
nothing to commit, working tree clean
git status 0.23s user 1.16s system 61% cpu 2.260 total
From the second time onwards git status
will be much faster as shown below.
time git status
On branch master
Your branch is ahead of 'origin/master' by 1 commit.
(use "git push" to publish your local commits)
nothing to commit, working tree clean
git status 0.05s user 0.02s system 63% cpu 0.108 total
It took 108ms to run the command. It is close to 14 times faster.
You can learn about how FSMonitor works by reading Github post.