Here are 10 posts I thought were worth sharing this week.
#1. How eBPF will solve Service Mesh – Goodbye Sidecars – Link
Service Mesh has evolved over the years. We started from a library based approach, then moved on to the sidecar containers, and finally service mesh capabilities will become part of Linux via eBPF. We use Istio at work. I was aware that there is some overhead of Istio as you have a proxy with each workload. As per this post, sidecar container based service meshes add 3-4 times of latency. This is a huge cost. For a 500 node cluster where each node runs 30 pods this adds up to 1TB of memory used by all sidecar proxies. This assumes each proxy adds 70MB of overhead. eBPF is a technology to watch. It looks like it is the technology that will make service mesh efficient and performant. It is still in early days so we will have to wait before it becomes mainstream.
#2. Your Teams Don’t Need To Set Their Own Goals – Link
I have also seen this working. Most of the time teams don’t know and understand how they should go about achieving a large goal. Then, you as a leader have to break a large goal down to small, manageable goals that the team can aim to achieve. I prefer goals to be realistic. They don’t have to be easy but they don’t have to be too difficult as well. You have to build the team’s confidence and that is built when they achieve realistic goals. It requires a leader to have clarity and they should be good at decomposing problems.
Here are 10 posts I thought were worth sharing this week.
#1. What we learnt by migrating from CircleCI to Buildkite – Link
This post covers how and why Hasura switched their CI service from CircleCI to Buildkite. They started by defining the requirements from their CI, then they evaluated different solutions, and finally introduced it in their ecosystem. Their main reason to switch CI service was cost. They reduced the cost by 50%. This required them to own some of the aspects of the CI operations. A couple of interesting things I learnt from this post:
Use of labels to trigger build. They used them to save costs.
The use of dynamic configuration. They wrote their build code in a Go program. This saved them from YAML hell. Interestingly, they use shellcheck t static analysis of shell script
It is all about perspective. Tech Debt brings negative emotions in people and it becomes difficult to sell it to higher management. In this post, the author suggests we reframe tech debt as tech wealth while communicating with stakeholders. Building tech wealth means getting more value out of the software we’re creating, as well as our efforts to develop and maintain it. Author suggests two ways we can plan for tech wealth:
Allocate time within each planning cycle
Dedicate the last few cycles in a quarter
In one of the products I worked on we used to schedule 1 day per sprint (2 weeks) for paying tech debt. We had sprint demo every alternate Thursday and the next day i.e. Friday was scheduled for working on tech debt items. One problem with 1 day every sprint is that bigger items can’t be handled. We used to create stories for them and pick them as part of the sprint backlog.
This is an amazing read. Etsy engineer Salem Hilal shares their ES6 to Typescript journey. In this post, he covers the strategy, technical challenges they faced, tooling they built, and how they educated their engineers to write effective Typescript code. Etsy has been built over the last 16 years and they had 17000 JS files. Migrating such a codebase is a multi year effort. You need to have a clear plan and ensure there are no tail migration issues.
A couple of months back a customer wanted us to migrate their 20+ TB Oracle database to Postgres. They had hundreds of stored procedures written in Oracle. Also, their batch processing jobs were written in stored procedures. They wanted to do the complete migration in a couple of months. We politely told them it is not possible. They went with another vendor that said they could do it in two months. Migrations are very risky. There are so many unknowns involved. For a vendor it is much more difficult because they don’t even understand your functional requirements and code base. For migrations I prefer to be safe than sorry.
Here are 11 posts I thought were worth sharing this week.
To Learn a New Language, Read Its Standard Library – Link
I like the idea of learning a new language by reading its standard library. You learn the idiomatic way of writing code in a language by reading source code written by its original authors. I am planning to learn Rust. I will also give this approach a try. There are two limiting factors when you might struggle with this approach 1) Poor documentation 2) when the standard library is implemented in a lower level language.
In this post, Subbu Allamaraju shares his thoughts on how you can be both a nice and effective leader . He talks about six different leadership styles and how those leadership styles create positive and negative climates. I am in my first engineering leadership role and still figuring out my leadership style. Based on my limited leadership experience I think a leader can have multiple leadership styles depending on the situation. There are times you have to course correct and change your leadership style based on the situation and context. Also, I think leaders can be “nice” and “not nice” depending on the context. Leadership is hard.
42 things I learned from building a production database – Link
Not a deep technical post. Many useful pieces of advice by Mahesh Balakrishnan in this post. He worked on a Chubby like system at Facebook. My favorites:
Be conservative on APIs and liberal with implementations
When designing APIs, write code for one implementation; plan actively for the second implementation; and hope/pray that things will work for a third implementation.
Anything that can’t be measured easily (e.g., consistency) is often forgotten; pay particular attention to attributes that are difficult to measure
It is hard to retain good talent in tech. I agree with the reasons on why employees are resigning in huge numbers. Shortage of good talent, better compensation, lack of purpose, burnout, career advancement are the reasons that I hear as well. One reason that I don’t see covered in the post is poor leadership skills. It might be implicit but I think it should be called out as well. Good leadership can provide a sense of belonging and purpose that can help retain good employees.
Here are 7 posts I thought were worth sharing this week.
Google: A Collection Of Best Practices For Production Services – Link
This is an amazing read. Building resilient systems is hard. The first step to building resilient systems is to become aware of the practices that are used in the trenches. All the practices are worth reading/knowing and you should look for opportunities to apply them in your environment . Every few weeks I see teams struggling with making configuration changes safely. Article gives some practice advice on the same. Writing fail-safe and resilient HTTP clients is not easy. HTTP clients are used heavily in Microservices architecture. Most developers consider the happy path when service either succeeds or fails with expected response codes. But, we need to consider retries with jitter, timeouts, queueing, load shedding, etc while building HTTP clients. This article covers a few more practices that can help us build resilient systems.
Amazing talk by Rich Hickey on the History of Clojure. In this video, Rich covers how he took a two year self-funded sabbatical to build the first version of Clojure. Clojure is a dialect of Lisp, the second-oldest high-level programming language. Lisp is a functional programming language. He also covers challenges he faced to get the first customers, how he supported the community via email and IRC. I liked how he defined constraints for the languages and how he looked for practical answers to the real world questions related to performance. I have played with Common Lisp for a month or so while reading SICP. I will try Clojure soon as well.
This is a post from Netflix engineering team where they talked about how their CD practices enable high availability. Honestly, I was expecting to read more about system design stuff. Still, I enjoyed this post. They talked about things like blue/green deployment, deployment windows,canary testing, rollbacks, and few others. If you are into DevOps space for long you might already know most of them. I think the key about these practices is not knowing them but actually applying them at your scale. You really need a solid engineering culture to practice them.
I prefer Monorepo over Polyrepo(one repo per service) approach. The author made a good point on the cost of monorepo rising steeply once your team size reaches 100~150 developers. Author suggests that you can start with a monorepo and if your development team velocity starts to take a hit because of code organization then you can split the monorepo into multiple repositories.
I work in an IT service organization where the biggest project that we do has 50-70 developers. After deliberating a lot I have started using the following structure. So, in my setup I have 4 git repos. Consider a team of 50 developers, none of the repos cross more than 25 developers working on a single repo. I have shared my thoughts on Monorepo here.
How To Rapidly Improve At Any Programming Language – Link
This makes sense. Reading closed PRs of an open source project can be a valuable source of knowledge. I might give it a try as well. I tried doing code reading earlier when I went over code of popular open source library Retrofit and blogged about lessons I learnt.
Learn how to do text processing with sed, awk, and grep
Why you should be deploying Postgres primarily on Kubernetes – Link
Covers many reasons why you may want to run Postgres on Kubernetes. Reasons include production ready configuration, connection pooling, automated backup, monitoring, high availability setup, and few others. Also, it talks about an interesting project called Stackgres that automates all of that. With a few lines of YAML you can have your full setup automated and running on Kubernetes in less than an hour.