It is hard to retain good talent in tech. I agree with the reasons on why employees are resigning in huge numbers. Shortage of good talent, better compensation, lack of purpose, burnout, career advancement are the reasons that I hear as well. One reason that I don’t see covered in the post is poor leadership skills. It might be implicit but I think it should be called out as well. Good leadership can provide a sense of belonging and purpose that can help retain good employees.
Here are 7 posts I thought were worth sharing this week.
Google: A Collection Of Best Practices For Production Services – Link
This is an amazing read. Building resilient systems is hard. The first step to building resilient systems is to become aware of the practices that are used in the trenches. All the practices are worth reading/knowing and you should look for opportunities to apply them in your environment . Every few weeks I see teams struggling with making configuration changes safely. Article gives some practice advice on the same. Writing fail-safe and resilient HTTP clients is not easy. HTTP clients are used heavily in Microservices architecture. Most developers consider the happy path when service either succeeds or fails with expected response codes. But, we need to consider retries with jitter, timeouts, queueing, load shedding, etc while building HTTP clients. This article covers a few more practices that can help us build resilient systems.
Amazing talk by Rich Hickey on the History of Clojure. In this video, Rich covers how he took a two year self-funded sabbatical to build the first version of Clojure. Clojure is a dialect of Lisp, the second-oldest high-level programming language. Lisp is a functional programming language. He also covers challenges he faced to get the first customers, how he supported the community via email and IRC. I liked how he defined constraints for the languages and how he looked for practical answers to the real world questions related to performance. I have played with Common Lisp for a month or so while reading SICP. I will try Clojure soon as well.
This is a post from Netflix engineering team where they talked about how their CD practices enable high availability. Honestly, I was expecting to read more about system design stuff. Still, I enjoyed this post. They talked about things like blue/green deployment, deployment windows,canary testing, rollbacks, and few others. If you are into DevOps space for long you might already know most of them. I think the key about these practices is not knowing them but actually applying them at your scale. You really need a solid engineering culture to practice them.
I prefer Monorepo over Polyrepo(one repo per service) approach. The author made a good point on the cost of monorepo rising steeply once your team size reaches 100~150 developers. Author suggests that you can start with a monorepo and if your development team velocity starts to take a hit because of code organization then you can split the monorepo into multiple repositories.
I work in an IT service organization where the biggest project that we do has 50-70 developers. After deliberating a lot I have started using the following structure. So, in my setup I have 4 git repos. Consider a team of 50 developers, none of the repos cross more than 25 developers working on a single repo. I have shared my thoughts on Monorepo here.
How To Rapidly Improve At Any Programming Language – Link
This makes sense. Reading closed PRs of an open source project can be a valuable source of knowledge. I might give it a try as well. I tried doing code reading earlier when I went over code of popular open source library Retrofit and blogged about lessons I learnt.
Learn how to do text processing with sed, awk, and grep
Why you should be deploying Postgres primarily on Kubernetes – Link
Covers many reasons why you may want to run Postgres on Kubernetes. Reasons include production ready configuration, connection pooling, automated backup, monitoring, high availability setup, and few others. Also, it talks about an interesting project called Stackgres that automates all of that. With a few lines of YAML you can have your full setup automated and running on Kubernetes in less than an hour.
As mentioned in the post, clean code is also a trade off. You really should weigh if clean code adds real value or satisfy your inner clean code ninja.
Another important point mentioned in the post is that you should always take the person whose code you are refactoring in favour before refactoring their code. Explain your rationale and then work with them to improve it.
A healthy engineering team is constantly building trust. Rewriting your teammate’s code without a discussion is a huge blow to your ability to effectively collaborate on a codebase together
The No Code Delusion. 15 mins read. I agree with the author that we are still far away from pure no code systems. Most of the no code solutions fail when you want to do things slightly differently. They become limiting and you struggle to keep productivity. They are good for proof of concepts where you want to showcase value faster.
To Serverless or Not To Serverless. 10 mins read. There is nothing new in this blog if you already know about pros and cons on Serverless. But, it is a good written article that you can keep handy in case you need to make decision on Serverless.
A Scientific Approach to Capacity Planning: 20 mins read. This blogs talks about Universal Scalability Law (USL). The USL law takes into account the fact that computer systems don’t scale linearly due to queueing effects.
Radical Candor: Software Edition: 30 mins read. This is a long read. It offers a lot of good advice on how to apply Radical Candor in software development. Radical candor is showing care and giving straight feedback. You have to build a relationship with people so that you can be true to them.
The Myth of Architect as Chess Master: 10 mins read. I could relate to the author as I often find myself in such situations. Software development is a cognitive activity and most systems are built over years. So, it is difficult to parachute yourself into a system and provide meaningful advice without spending a lot of time understanding the context and business drivers driving the project.
> When you give the best you have to someone in need, it translates into something much deeper to the receiver. It means they are worthy.
> If it’s not good enough for you, it’s not good enough for those in need either. Giving the best you have does more than feed an empty belly—it feeds the soul.
Calendar Versioning: 10 mins read. CalVer is a versioning convention based on your project’s release calendar, instead of arbitrary numbers.
Doing a database join with CSV files: 10 mins read. xsv is a tool that you can use to join two CSV files. The author shows examples of inner join, left join, and right join. Very useful indeed.
Ordered data structures perform much better when n is large. With hash based collections, one collision can cause O(n) performance. Range queries becomes O(n) if implemented using hash tables
Ordering helps in indexes and we can reuse one index in multiple ways. With hash tables, we have to implement separate indexes
Ordered collection achieve locality of reference.
Xor Filters: Faster and Smaller Than Bloom Filters: 15 mins read. In this post, author talks about Xor filters to solve problems where you need to check whether an item exist in cache or not. Usually we solve such problems using a hash based collection but this can be solve using Xor filters as well. Xor filters take a bit longer to build, but once built, it uses less memory and is about 25% faster. Bloom filters and cuckoo filters are two other common approaches to solve these kind of problem as well.
Like GitHub, Shopify, and Airbnb, DigitalOcean began as a Rails application in 2011. The Rails application, internally known as Cloud, managed all user interactions in both the UI and public API. Aiding the Rails service were two Perl services: Scheduler and DOBE (DigitalOcean BackEnd). Scheduler scheduled and assigned Droplets to hypervisors, while DOBE was in charge of creating the actual Droplet virtual machines. While the Cloud and Scheduler ran as stand-alone services, DOBE ran on every server in the fleet.
For four years, the database message queue formed the backbone of DigitalOcean’s technology stack. During this period, we adopted a microservice architecture, replaced HTTPS with gRPC for internal traffic, and ousted Perl in favor of Golang for the backend services. However, all roads still led to that MySQL database.
By the start of 2016, the database had over 15,000 direct connections, each one querying for new events every one to five seconds. If that was not bad enough, the SQL query that each hypervisor used to fetch new Droplet events had also grown in complexity. It had become a colossus over 150 lines long and JOINed across 18 tables.
When Event Router went live, it slashed the number of database connections from over 15,000 to less than 100.
Unfortunately, removing the database’s message queue was not an easy feat. The first step was preventing services from having direct access to it. The database needed an abstraction layer.
Now the real work began. Having complete control of the event system meant that Harpoon had the freedom to reinvent the Droplet workflow.
Harpoon’s first task was to extract the message queue responsibilities from the database into itself. To do this, Harpoon created an internal messaging queue of its own that was made up of RabbitMQ and asynchronous workers. As of this writing in 2019, this is where the Droplet event architecture stands.
The first thing to consider is agility—cloud services offer significant advantages on how quickly you can spin infrastructure up and down, allowing you to concentrate on creating value on the software and data side.
But the flip side of this agility is our second factor, which is cost. The agility and convenience of cloud infrastructure comes with a price premium that you pay over time, particularly for “higher level” services than raw compute and storage.
The third factor is control. If you want full control over the hardware or network or security environment that your data lives in, then you will probably want to manage that on-premises.
Tools I discovered this week
Broot: It is a CLI tool that you can use to get an overview of a directory, even a big one. It is written in Rust programming language. I use it as an alternative to tree command.
xsv: It is a CLI tool for working with CSV files. It can concatenate, count, join, flatten, and many other things. It is Swiss army tool for CSV. It is written in Rust programming language.
The greatest enemy of knowledge is not ignorance, it is the illusion of knowledge. – Stephen Hawking
System design hack: Postgres is a great pub/sub & job server: 10 mins read. I have read multiple times that people are using Postgres as a job queue or as a pub/sub solution. It does require you to mess with SQL and write PSQL functions but I think it could be a good solution if you don’t want to manage some other pub/sub server.
Head In The Clouds: 15 min read. This articles covers how folks at FreeAgent planned their cloud migration journey. The key points from the post are:
Co-locating has been a terrific win for us over the years, providing us with a cost-effective, high performance compute platform that has allowed us to scale to over 95,000 customers with close to 5 9’s reliability.
Growth often acts as a forcing function with regards to infrastructure. Head count has doubled. Customer count is growing quickly.
Desire for new features is another forcing function. They wanted more datacenters to increase resilience. They were reaching hardware limitations. The ops team was pressed and it was challenging to find ops engineers with the right skills. They were experimenting with ML. Serverless was becoming a go to for production. They wanted to improve deployment. And scaling the database was a challenge.
Experiments were run to research moving to AWS: Granted, any infrastructure migration would be expensive, the project complex and it would come with many challenges, but the advantages and opportunities that a full cloud migration would open up in the future were undeniable.
The decision was made to migrate to AWS!
Early on in the R&D phase we became customers of Gruntwork.io and have relied heavily on their Infrastructure as Code library and training to accelerate the project.
We built network isolation for 1,500 services to make Monzo more secure: 20 mins read. In the Security team at Monzo, one of our goals is to move towards a completely zero trust platform. This means that in theory, we’d be able to run malicious code inside our platform with no risk – the code wouldn’t be able to interact with anything dangerous without the security team granting special access.
Scaling in the presence of errors—don’t ignore them: 20 mins read. The secret to error handling at scale isn’t giving up, ignoring the problem, or even it trying again—it is structuring a program for recovery, making errors stand out, allowing other parts of the program to make decisions. Techniques like fail-fast, crash-only-software, process supervision, but also things like clever use of version numbers, and occasionally the odd bit of statelessness or idempotence. What these all have in common is that they’re all methods of recovery. Recovery is the secret to handling errors. Especially at scale. Giving up early so other things have a chance, continuing on so other things can catch up, restarting from a clean state to try again, saving progress so that things do not have to be repeated. That, or put it off for a while. Buy a lot of disks, hire a few SREs, and add another graph to the dashboard.
Modern Data Practice and the SQL Tradition: 15 mins read. Over the last one year I have read multiple posts suggesting we should start with relational database route. SQL is becoming the defacto language for all things data. Most developers start looking at alternatives too early in the cycle before understanding pros and cons of using a technology. The key points from the post are:
The more I work with existing NoSQL deployments however, the more I believe that their schemaless nature has become an excuse for sloppiness and unwillingness to dwell on a project’s data model beforehand.
One can now model the “known” part of his data model in a typical relational manner and dump his “raw and unstructured” data into JSON columns. No need to “denormalize all the things” just because some element of the domain is “unstructured”.
The good thing with this approach is that one can have a single database for both their structured and unstructured data without sacrificing ACID-compliance.
SQL and relational databases have come a long way and nowadays offer almost any function a data scientist could ask.
Relational databases usually make more sense financially too. Distributed systems like MongoDB and ElasticSearch are money-hungry beasts and can kill your technology and human resources budget; unless you are absolutely certain and have run the numbers and decided that they do really make sense for your case.
Performance and stability with relational databases can be better out of the box
Hash join in MySQL 8: 10 mins read. You should read this blog if you want to learn how hash joins are implemented by databases. It will give you a good and detailed understanding on the subject.
Managing a Go monorepo with Bazel: 10 mins read. I don’t think we still have a winner between monorepo and multiple repo approach when building Microservices. We have big organisations like Google and Facebook that prefer Monorepo approach and then we have organizations like Netflix that recommend multi repo approach. This post covers how you can manage a Go monorepo using Bazel build tool. I have not used Bazel so far but I am seriously considering it for my personal projects.
The Value in Go’s Simplicity: 10 mins read. Go is one language that I really want to spend more time on. It is a popular language used almost everywhere these days. In this blog, author makes the case for Go’s simplicity. As author mentioned, Go core development team has take simplicity to another level. To keep language simple they are not allowing many good features like Generics implemented in Go.
When XML beats JSON: UI layouts: 5 mins read. UI layouts are represented as component trees. And XML is ideal for representing tree structures. It’s a match made in heaven! In fact, the most popular UI frameworks in the world (HTML and Android) use XML syntax to define layouts.
Daily Stand-up Injection of Guilt: 10 mins read. Yegor writes, “Only weak managers need daily stand-up meetings to coordinate the team, while strong ones use more formal instruments to organize the flow of information. However, as someone noted, morning meetings are not supposed to be used by managers to coordinate anyone, but “to discuss progress, impediments and to plan.” I’m not buying it.”