The time to read this newsletter is 145 minutes.
There is only one way to happiness and that is to cease worrying about things which are beyond the power of our will – Epictetus
- Goodbye, Clean Code. 10 mins read. We sometimes get carried away with clean code.
- Clean code does not always produce simple code.
- As mentioned in the post, clean code is also a trade off. You really should weigh if clean code adds real value or satisfy your inner clean code ninja.
- Another important point mentioned in the post is that you should always take the person whose code you are refactoring in favour before refactoring their code. Explain your rationale and then work with them to improve it.
- A healthy engineering team is constantly building trust. Rewriting your teammate’s code without a discussion is a huge blow to your ability to effectively collaborate on a codebase together
- The No Code Delusion. 15 mins read. I agree with the author that we are still far away from pure no code systems. Most of the no code solutions fail when you want to do things slightly differently. They become limiting and you struggle to keep productivity. They are good for proof of concepts where you want to showcase value faster.
- Why we’re writing machine learning infrastructure in Go, not Python: 10 mins read.
How To Optimize AWS Lambda Performance. 20 mins read.
6 Lessons learned from optimizing the performance of a Node.js service. 10 mins read
To Serverless or Not To Serverless. 10 mins read. There is nothing new in this blog if you already know about pros and cons on Serverless. But, it is a good written article that you can keep handy in case you need to make decision on Serverless.
A Scientific Approach to Capacity Planning: 20 mins read. This blogs talks about Universal Scalability Law (USL). The USL law takes into account the fact that computer systems don’t scale linearly due to queueing effects.
The Curious Case of the Table-Locking UPDATE Query: 10 mins read. This post by a Heroku engineer talks about how he debugged an issue related to Postgres locks. Good read.
Radical Candor: Software Edition: 30 mins read. This is a long read. It offers a lot of good advice on how to apply Radical Candor in software development. Radical candor is showing care and giving straight feedback. You have to build a relationship with people so that you can be true to them.
The Myth of Architect as Chess Master: 10 mins read. I could relate to the author as I often find myself in such situations. Software development is a cognitive activity and most systems are built over years. So, it is difficult to parachute yourself into a system and provide meaningful advice without spending a lot of time understanding the context and business drivers driving the project.
##Video of the week
The time to read this newsletter is 145 minutes.
Strategy without tactics is the slowest route to victory. Tactics without strategy is the noise before defeat. – Sun Tzu
- Using the hunger I experienced as a kid to teach mine about generosity: 10 mins read. We all become too specific and choosy when it comes to helping others. We don’t want to offer the best we have. These are the best words I have read in a long time
> When you give the best you have to someone in need, it translates into something much deeper to the receiver. It means they are worthy.
> If it’s not good enough for you, it’s not good enough for those in need either. Giving the best you have does more than feed an empty belly—it feeds the soul.
Calendar Versioning: 10 mins read. CalVer is a versioning convention based on your project’s release calendar, instead of arbitrary numbers.
Doing a database join with CSV files: 10 mins read.
xsv is a tool that you can use to join two CSV files. The author shows examples of inner join, left join, and right join. Very useful indeed.
SQL, NoSQL, and Scale: How DynamoDB scales where relational databases don’t: 20 mins read. This post provides a good overview on why RDBMS fail to scale and how DynamoDB can be used to build web scale applications.
Why databases use ordered indexes but programming uses hash tables: 15 mins read. This post explains why databases uses b-tree and programs use hash tables. The main reasons shared by author are:
- Ordered data structures perform much better when n is large. With hash based collections, one collision can cause O(n) performance. Range queries becomes O(n) if implemented using hash tables
- Ordering helps in indexes and we can reuse one index in multiple ways. With hash tables, we have to implement separate indexes
- Ordered collection achieve locality of reference.
- Xor Filters: Faster and Smaller Than Bloom Filters: 15 mins read. In this post, author talks about Xor filters to solve problems where you need to check whether an item exist in cache or not. Usually we solve such problems using a hash based collection but this can be solve using Xor filters as well. Xor filters take a bit longer to build, but once built, it uses less memory and is about 25% faster. Bloom filters and cuckoo filters are two other common approaches to solve these kind of problem as well.
Distributed architecture concepts I learned while building a large payments system: 20 mins read.The author described important distributed system concepts. He covers consistency, durability, SLA, and many other concepts.
From 15,000 database connections to under 100: DigitalOcean’s tale of tech debt: 20 mins read. This post by Digital Ocean is a must read for every developer. They talked about how they incrementally moved their legacy DB based message queue to the one based on RabbitMQ. Key points from the post are:
- Like GitHub, Shopify, and Airbnb, DigitalOcean began as a Rails application in 2011. The Rails application, internally known as Cloud, managed all user interactions in both the UI and public API. Aiding the Rails service were two Perl services: Scheduler and DOBE (DigitalOcean BackEnd). Scheduler scheduled and assigned Droplets to hypervisors, while DOBE was in charge of creating the actual Droplet virtual machines. While the Cloud and Scheduler ran as stand-alone services, DOBE ran on every server in the fleet.
- For four years, the database message queue formed the backbone of DigitalOcean’s technology stack. During this period, we adopted a microservice architecture, replaced HTTPS with gRPC for internal traffic, and ousted Perl in favor of Golang for the backend services. However, all roads still led to that MySQL database.
- By the start of 2016, the database had over 15,000 direct connections, each one querying for new events every one to five seconds. If that was not bad enough, the SQL query that each hypervisor used to fetch new Droplet events had also grown in complexity. It had become a colossus over 150 lines long and JOINed across 18 tables.
- When Event Router went live, it slashed the number of database connections from over 15,000 to less than 100.
- Unfortunately, removing the database’s message queue was not an easy feat. The first step was preventing services from having direct access to it. The database needed an abstraction layer.
- Now the real work began. Having complete control of the event system meant that Harpoon had the freedom to reinvent the Droplet workflow.
- Harpoon’s first task was to extract the message queue responsibilities from the database into itself. To do this, Harpoon created an internal messaging queue of its own that was made up of RabbitMQ and asynchronous workers. As of this writing in 2019, this is where the Droplet event architecture stands.
- Why do we need distributed systems?: 10 mins read. We build distributed systems because
- Distributed systems offer better availability
- Distributed systems offer better durability
- Distributed systems offer better scalability
- Distributed systems offer better efficiency
- On Kubernetes, Hybrid and Multi-cloud: 15 mins read. The key points in the post are:
- The first thing to consider is agility—cloud services offer significant advantages on how quickly you can spin infrastructure up and down, allowing you to concentrate on creating value on the software and data side.
- But the flip side of this agility is our second factor, which is cost. The agility and convenience of cloud infrastructure comes with a price premium that you pay over time, particularly for “higher level” services than raw compute and storage.
- The third factor is control. If you want full control over the hardware or network or security environment that your data lives in, then you will probably want to manage that on-premises.
Tools I discovered this week
- Broot: It is a CLI tool that you can use to get an overview of a directory, even a big one. It is written in Rust programming language. I use it as an alternative to
- xsv: It is a CLI tool for working with CSV files. It can concatenate, count, join, flatten, and many other things. It is Swiss army tool for CSV. It is written in Rust programming language.
- pigz: A parallel implementation of gzip.
Video of the week
The time to read this newsletter is 135 minutes.
The greatest enemy of knowledge is not ignorance, it is the illusion of knowledge. – Stephen Hawking
- System design hack: Postgres is a great pub/sub & job server: 10 mins read. I have read multiple times that people are using Postgres as a job queue or as a pub/sub solution. It does require you to mess with SQL and write PSQL functions but I think it could be a good solution if you don’t want to manage some other pub/sub server.
- Developers mentoring other developers: practices I’ve seen work well: 20 mins read. The article covers how we can build good mentorship programs at our work.
- Head In The Clouds: 15 min read. This articles covers how folks at FreeAgent planned their cloud migration journey. The key points from the post are:
- Co-locating has been a terrific win for us over the years, providing us with a cost-effective, high performance compute platform that has allowed us to scale to over 95,000 customers with close to 5 9’s reliability.
- Growth often acts as a forcing function with regards to infrastructure. Head count has doubled. Customer count is growing quickly.
- Desire for new features is another forcing function. They wanted more datacenters to increase resilience. They were reaching hardware limitations. The ops team was pressed and it was challenging to find ops engineers with the right skills. They were experimenting with ML. Serverless was becoming a go to for production. They wanted to improve deployment. And scaling the database was a challenge.
- Experiments were run to research moving to AWS: Granted, any infrastructure migration would be expensive, the project complex and it would come with many challenges, but the advantages and opportunities that a full cloud migration would open up in the future were undeniable.
- The decision was made to migrate to AWS!
- Early on in the R&D phase we became customers of Gruntwork.io and have relied heavily on their Infrastructure as Code library and training to accelerate the project.
- We built network isolation for 1,500 services to make Monzo more secure: 20 mins read. In the Security team at Monzo, one of our goals is to move towards a completely zero trust platform. This means that in theory, we’d be able to run malicious code inside our platform with no risk – the code wouldn’t be able to interact with anything dangerous without the security team granting special access.
- Scaling in the presence of errors—don’t ignore them: 20 mins read. The secret to error handling at scale isn’t giving up, ignoring the problem, or even it trying again—it is structuring a program for recovery, making errors stand out, allowing other parts of the program to make decisions. Techniques like fail-fast, crash-only-software, process supervision, but also things like clever use of version numbers, and occasionally the odd bit of statelessness or idempotence. What these all have in common is that they’re all methods of recovery. Recovery is the secret to handling errors. Especially at scale. Giving up early so other things have a chance, continuing on so other things can catch up, restarting from a clean state to try again, saving progress so that things do not have to be repeated. That, or put it off for a while. Buy a lot of disks, hire a few SREs, and add another graph to the dashboard.
- Modern Data Practice and the SQL Tradition: 15 mins read. Over the last one year I have read multiple posts suggesting we should start with relational database route. SQL is becoming the defacto language for all things data. Most developers start looking at alternatives too early in the cycle before understanding pros and cons of using a technology. The key points from the post are:
- The more I work with existing NoSQL deployments however, the more I believe that their schemaless nature has become an excuse for sloppiness and unwillingness to dwell on a project’s data model beforehand.
- One can now model the “known” part of his data model in a typical relational manner and dump his “raw and unstructured” data into JSON columns. No need to “denormalize all the things” just because some element of the domain is “unstructured”.
- The good thing with this approach is that one can have a single database for both their structured and unstructured data without sacrificing ACID-compliance.
- SQL and relational databases have come a long way and nowadays offer almost any function a data scientist could ask.
- Relational databases usually make more sense financially too. Distributed systems like MongoDB and ElasticSearch are money-hungry beasts and can kill your technology and human resources budget; unless you are absolutely certain and have run the numbers and decided that they do really make sense for your case.
- Performance and stability with relational databases can be better out of the box
- Hash join in MySQL 8: 10 mins read. You should read this blog if you want to learn how hash joins are implemented by databases. It will give you a good and detailed understanding on the subject.
- Managing a Go monorepo with Bazel: 10 mins read. I don’t think we still have a winner between monorepo and multiple repo approach when building Microservices. We have big organisations like Google and Facebook that prefer Monorepo approach and then we have organizations like Netflix that recommend multi repo approach. This post covers how you can manage a Go monorepo using Bazel build tool. I have not used Bazel so far but I am seriously considering it for my personal projects.
- The Value in Go’s Simplicity: 10 mins read. Go is one language that I really want to spend more time on. It is a popular language used almost everywhere these days. In this blog, author makes the case for Go’s simplicity. As author mentioned, Go core development team has take simplicity to another level. To keep language simple they are not allowing many good features like Generics implemented in Go.
- When XML beats JSON: UI layouts: 5 mins read. UI layouts are represented as component trees. And XML is ideal for representing tree structures. It’s a match made in heaven! In fact, the most popular UI frameworks in the world (HTML and Android) use XML syntax to define layouts.
Video of the week
The time to read this newsletter is 145 minutes.
I am not bothered by the fact that I am unknown. I am bothered when I do not know others – Confucius
- What nobody tells you about documentation: 20 mins read. I like the way author divided technical documentation into four buckets.
- Tutorials. They are learning oriented.
- How-To Guides. They are problem oriented.
- Reference: They are information oriented.
- Explanation: They are understanding oriented.
- Test Desiderata: 5 mins read. This post is by Kent Beck, author of JUnit. In this post, Kent goes beyond FIRST properties of test cases.
- Tests should be coupled to the behavior of code and decoupled from the structure of code. Seeing tests that fail on both counts
- Don’t Call Yourself A Programmer, And Other Career Advice 20 mins read.
- Testing Cloudflare workers: 20 mins read.
- Automated Disaster Recovery using CloudEndure: 10 mins read.
- AWS Lambda vs. Azure Functions: 10 Major Differences: 15 mins read.
- Daily Stand-up Injection of Guilt: 10 mins read. Yegor writes, “Only weak managers need daily stand-up meetings to coordinate the team, while strong ones use more formal instruments to organize the flow of information. However, as someone noted, morning meetings are not supposed to be used by managers to coordinate anyone, but “to discuss progress, impediments and to plan.” I’m not buying it.”
- Why Parcel Has Become My Go-To Bundler for Development: 10 mins read. An easy to read tutorial on how to start with Parcel, a zero config bundler. It really looks simple to use and comes with useful defaults.
- Storing 50 million events per second in Elasticsearch: How we did it: 20 mins read. A very detailed post on Datadome logs 50 million events per second for its customers to analyze and search over for a 30 day period.
- How Shopify Manages Petabyte Scale MySQL Backup and Restore: 15 mins read.
The time to read this newsletter is 200 minutes.
Religion is the opium of the masses – Karl Marx
- A Technical Introduction to MemSQL: 20 mins read. MemSQL is s fast, commercial, ANSI SQL compliant, highly scalable HTAP database. HTAP databases are those that support both OLTP and OLAP workloads. It supports ACID transactions just like a regular relational database .It also supports document and geospatial data types. I have also written a quick post on MemSQL that you can read.
It’s later than you think: 20 mins read. We all regret working too hard in the end. Give it a read it is an awesome write up on a heart breaking story.
Modern applications at AWS: 10 mins read. To succeed in using application development to increase agility and innovation speed, organizations must adopt five elements, in any order: microservices; purpose-built databases; automated software release pipelines; a serverless operational model; and automated, continuous security.
1 Year of Event Sourcing and CQRS: 30 mins read. This is a long read that covers DDD, CQRS, and Event Sourcing. In this post author covered how they implemented this architecture style and issues they faced.
The Single Most Important Internal Email in the History of Amazon: 20 mins read. This is a long read on how different organisations are organised. Some organisations are collocated and prefer synchronous mode of communication while others are distributed with asynchronous mode of communications. An organization’s communication system can be one of the most important leverages you can have to make an impact on productivity. Be very intentional about it.
Lessons from Design School for Software Engineers: 20 mins read. Great advice from an Engineer at Github. All the lessons resonated with me.
- You are not your audience
- Constructive, objective feedback is always better than reductive, subjective feedback
- You are not your designs/work
- Iteration is key for improvement
- Always critique your work
- A Multithreaded Fork of Redis That’s 5X Faster Than Redis : 20 mins read. This is interesting. A fork of Redis that makes use of multi-threading to make Redis 5x faster. From the post:
> KeyDB has a different philosophy on how the codebase should evolve. We feel that ease of use, high performance, and a “batteries included” approach is the best way to create a good user experience. While we have great respect for the Redis maintainers it is our opinion that the Redis approach focusses too much on simplicity of the code base at the expense of complexity for the user. This results in the need for external components and workarounds to solve common problems.
Why we decided to go for the Big Rewrite: 20 mins. This post goes into detail how channable did rewrite of their main data processing system. It has a lot of good advice that you can apply in your work as well.
How to Write Fast Code in Ruby on Rails: 15 mins read. This post contains general advice to write fast and performant Ruby code. Many of the lessons can be applied even if you use any other programming language.
Cascading Cache Invalidation: 25 mins read. This is an interesting article covering flaw in one of the best practice most people use for asset caching i.e content hashes in filenames and far-future expiry. Author also shared three possible solutions to the problem.
Video of the week
This week video: Intel and Rust: the Future of Systems Programming
The time to read this newsletter is 130 minutes.
A busy calendar and a busy mind will destroy your ability to create anything great. – Naval Ravikant
- GitHub stars won’t pay your rent: 20 mins read. The key point in the post is that you should not feel bad about charging money for your work. I think we software developers have taken it too far. Most of us feel that by making our work open source we are making the world better. But, the reality is that if you loose your job and need financial support then no user of your open source project will come to help. We need to become practical and keep financial reality in mind.
- Building a Kubernetes platform at Pinterest: 15 mins read. A lot of things you can learn about Kubernetes from this post by Pinterest engineering team. The key points for me are:
- You can use CRD to define your organisation specific service. Look at
- CRD can be used as an alternative to Helm
- Infrastructure team has three main priorities: 1) Service Reliability 2) Developer Productivity 3) Infra Efficiency
- Six Shades of Coupling: 15 mins read.
- When Redundancy Actually Helps: 10 mins read.
- The (not so) hidden cost of sharing code between iOS and Android: 10 mins read. So, we have come back the full circle. Organisations are moving away from code sharing approach when building same application for different mobile platforms. I have seen multiple organisations using C++ to write share code. The use of C++ limits number of developers you can find in the market and overall slows you down. You have to build tools to support your custom journey.
- 3 Strategies for implementing a microservices architecture: 5 mins read. The three strategies
- The Strangler method
- The Lego strategy
- The nuclear option
- Microservices, Apache Kafka, and Domain-Driven Design: 20 mins read.
- Habits vs. Goals: A Look at the Benefits of a Systematic Approach to Life: 10 mins read.
- Building an analytics stack from scratch: 15 mins read.
- Cutting Through Indecision & Overthinking: 10 mins read. Take action. Half the battle is won if you get started.
Video of the week
The time to read this newsletter is 210 minutes.
The general who wins a battle makes many calculations in his temple before the battle is fought. – Sun Tzu
- All the best engineering advice I stole from non-technical people – 20 mins read. The points that resonated with me:
- Know what people are asking you to be an expert in. This helps you avoid getting too much into other people territory.
- Thinking is also work. This is especially true when you move to management.
- Effective teams need trust. That’s not to say that frameworks for decision making or metrics tracking are not useful, they are critical — but replacing trust with process is called bureaucracy.
- Fast and flexible observability with canonical log lines – 20 mins read. Canonical logging is a simple technique where in addition to their normal log traces, requests also emit one long log line at the end that includes many of their key characteristics. The key points for me in this post are:
- Use logfmt to make logs machine readable
- We use canonical log lines to help address this. They’re a simple idea: in addition to their normal log traces, requests (or some other unit of work that’s executing) also emit one long log line at the end that pulls all its key telemetry into one place.
- Canonical lines are an ergonomic feature. By colocating everything that’s important to us, we make it accessible through queries that are easy for people to write, even under the duress of a production incident
- Why Some Platforms Thrive and Others Don’t – 25 mins read. When evaluating an opportunity involving a platform, entrepreneurs (and investors) should analyze the basic properties of the networks it will use and consider ways to strengthen network effects. It’s also critical to evaluate the feasibility of minimizing multi-homing, building global network structures, and using network bridging to increase scale while mitigating the risk of disintermediation. That exercise will illuminate the key challenges of growing and sustaining the platform and help businesspeople develop more-realistic assessments of the platform’s potential to capture value
- How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh – 30 mins read. A long read that makes case for distributed data mesh. It applies DDD principles to designing a data lake. A refreshing take on how to design date lakes.
- Re-Architecting the Video Gatekeeper – 15 mins read. This post cover how one of the Netflix tech team used Hollow to improve performance of their service. Hollow is a total high-density cache built by Netflix. The post covers why near cache was suitable for their use case. This is a detailed post covering the existing and new architecture. I learnt a lot while reading this article.
- Our not-so-magic journey scaling low latency, multi-region services on AWS – 20 mins read. This is another detailed post covering how Atlassian built a low latency service. They first tried DynamoDB but that didn’t cut for them. So, they also use a Caffeine based near cache to achieve the numbers expected from their service.
- Making Containers More Isolated: An Overview of Sandboxed Container Technologies – 25 mins read.
- Benchmarking: Do it with Transparency or don’t do it at all – 20 mins read. This is a detailed rebuttal by Ongress team on MongoDB blog where they dismissed the benchmark report created by Ongress.
- Deconstructing the Monolith: Designing Software that Maximizes Developer Productivity – 20 mins read. This article by Shopify is a must read for anyone planning to adopt Microservices architecture. It is practical and pragmatic. The key points for me in this post are:
- Application architecture evolve over time. The right way to think about evolution is to go from Monolith -> Modular monolith -> Microservices.
- Monolithic architecture has many advantages.
- Monolithic architecture can take an application very far since it’s easy to build and allows teams to move very quickly in the beginning to get their product in front of customers earlier.
- You’ll only need to maintain one repository, and be able to easily search and find all functionality in one folder.
- It also means only having to maintain one test and deployment pipeline, which, depending on the complexity of your application, may avoid a lot of overhead.
- One of the most compelling benefits of choosing the monolithic architecture over multiple separate services is that you can call into different components directly, rather than needing to communicate over web service API’s
- Disadvantages of Monolithic architecture
- As system grows challenge of building and testing new features increases
- High coupling and a lack of boundaries
- Developing in Shopify required a lot of context to make seemingly simple changes. When new Shopifolk onboarded and got to know the codebase, the amount of information they needed to take in before becoming effective was massive
- Microservices architecture increases deployment and operational complexity. The tools that works great for monolithic code bases stop working with Microservices architecture.
- A modular monolith is a system where all of the code powers a single application and there are strictly enforced boundaries between different domains.
- Approach to move to Modular monolith
- Reorganize code by real-world concepts and boundaries
- Ensure all tests work after reorganisation
- Build tools that help track progress of each component towards its goal of isolation. Shopify developed a tool called Wedge that highlights any violations of domain boundaries (when another component is accessed through anything but its publicly defined API), and data coupling across boundaries
- According to Martin Fowler, “almost all the cases where I’ve heard of a system that was built as a microservice system from scratch, it has ended in serious trouble… you shouldn’t start a new project with microservices, even if you’re sure your application will be big enough to make it worthwhile
- “It’s dead, Jim”: How we write an incident postmortem – 15 mins read. I believe it is a good exercise to do a post mortem even if you don’t follow SRE practices. The key points for me in this post are:
- A postmortem is the process by which we learn from failure, and a way to document and communicate those lessons.
- Why to write one?
- It allows us to document the incident, ensuring that it won’t be forgotten.
- They are the most effective mechanism we can use to drive improvement in our infrastructure.
- You should share postmortems because your customers deserve to know why their services didn’t behave as expected
- We shouldn’t be satisfied with identifying what triggered an incident (after all, there is no root cause), but should use the opportunity to investigate all the contributing factors that made it possible, and/or how our automation might have been able to prevent this from ever happening.
- What we want is to learn why our processes allowed for that mistake to happen, to understand if the person that made a mistake was operating under wrong assumptions.