> When you give the best you have to someone in need, it translates into something much deeper to the receiver. It means they are worthy.
> If it’s not good enough for you, it’s not good enough for those in need either. Giving the best you have does more than feed an empty belly—it feeds the soul.
Calendar Versioning: 10 mins read. CalVer is a versioning convention based on your project’s release calendar, instead of arbitrary numbers.
Doing a database join with CSV files: 10 mins read. xsv is a tool that you can use to join two CSV files. The author shows examples of inner join, left join, and right join. Very useful indeed.
Ordered data structures perform much better when n is large. With hash based collections, one collision can cause O(n) performance. Range queries becomes O(n) if implemented using hash tables
Ordering helps in indexes and we can reuse one index in multiple ways. With hash tables, we have to implement separate indexes
Ordered collection achieve locality of reference.
Xor Filters: Faster and Smaller Than Bloom Filters: 15 mins read. In this post, author talks about Xor filters to solve problems where you need to check whether an item exist in cache or not. Usually we solve such problems using a hash based collection but this can be solve using Xor filters as well. Xor filters take a bit longer to build, but once built, it uses less memory and is about 25% faster. Bloom filters and cuckoo filters are two other common approaches to solve these kind of problem as well.
Like GitHub, Shopify, and Airbnb, DigitalOcean began as a Rails application in 2011. The Rails application, internally known as Cloud, managed all user interactions in both the UI and public API. Aiding the Rails service were two Perl services: Scheduler and DOBE (DigitalOcean BackEnd). Scheduler scheduled and assigned Droplets to hypervisors, while DOBE was in charge of creating the actual Droplet virtual machines. While the Cloud and Scheduler ran as stand-alone services, DOBE ran on every server in the fleet.
For four years, the database message queue formed the backbone of DigitalOcean’s technology stack. During this period, we adopted a microservice architecture, replaced HTTPS with gRPC for internal traffic, and ousted Perl in favor of Golang for the backend services. However, all roads still led to that MySQL database.
By the start of 2016, the database had over 15,000 direct connections, each one querying for new events every one to five seconds. If that was not bad enough, the SQL query that each hypervisor used to fetch new Droplet events had also grown in complexity. It had become a colossus over 150 lines long and JOINed across 18 tables.
When Event Router went live, it slashed the number of database connections from over 15,000 to less than 100.
Unfortunately, removing the database’s message queue was not an easy feat. The first step was preventing services from having direct access to it. The database needed an abstraction layer.
Now the real work began. Having complete control of the event system meant that Harpoon had the freedom to reinvent the Droplet workflow.
Harpoon’s first task was to extract the message queue responsibilities from the database into itself. To do this, Harpoon created an internal messaging queue of its own that was made up of RabbitMQ and asynchronous workers. As of this writing in 2019, this is where the Droplet event architecture stands.
The first thing to consider is agility—cloud services offer significant advantages on how quickly you can spin infrastructure up and down, allowing you to concentrate on creating value on the software and data side.
But the flip side of this agility is our second factor, which is cost. The agility and convenience of cloud infrastructure comes with a price premium that you pay over time, particularly for “higher level” services than raw compute and storage.
The third factor is control. If you want full control over the hardware or network or security environment that your data lives in, then you will probably want to manage that on-premises.
Tools I discovered this week
Broot: It is a CLI tool that you can use to get an overview of a directory, even a big one. It is written in Rust programming language. I use it as an alternative to tree command.
xsv: It is a CLI tool for working with CSV files. It can concatenate, count, join, flatten, and many other things. It is Swiss army tool for CSV. It is written in Rust programming language.
The greatest enemy of knowledge is not ignorance, it is the illusion of knowledge. – Stephen Hawking
System design hack: Postgres is a great pub/sub & job server: 10 mins read. I have read multiple times that people are using Postgres as a job queue or as a pub/sub solution. It does require you to mess with SQL and write PSQL functions but I think it could be a good solution if you don’t want to manage some other pub/sub server.
Head In The Clouds: 15 min read. This articles covers how folks at FreeAgent planned their cloud migration journey. The key points from the post are:
Co-locating has been a terrific win for us over the years, providing us with a cost-effective, high performance compute platform that has allowed us to scale to over 95,000 customers with close to 5 9’s reliability.
Growth often acts as a forcing function with regards to infrastructure. Head count has doubled. Customer count is growing quickly.
Desire for new features is another forcing function. They wanted more datacenters to increase resilience. They were reaching hardware limitations. The ops team was pressed and it was challenging to find ops engineers with the right skills. They were experimenting with ML. Serverless was becoming a go to for production. They wanted to improve deployment. And scaling the database was a challenge.
Experiments were run to research moving to AWS: Granted, any infrastructure migration would be expensive, the project complex and it would come with many challenges, but the advantages and opportunities that a full cloud migration would open up in the future were undeniable.
The decision was made to migrate to AWS!
Early on in the R&D phase we became customers of Gruntwork.io and have relied heavily on their Infrastructure as Code library and training to accelerate the project.
We built network isolation for 1,500 services to make Monzo more secure: 20 mins read. In the Security team at Monzo, one of our goals is to move towards a completely zero trust platform. This means that in theory, we’d be able to run malicious code inside our platform with no risk – the code wouldn’t be able to interact with anything dangerous without the security team granting special access.
Scaling in the presence of errors—don’t ignore them: 20 mins read. The secret to error handling at scale isn’t giving up, ignoring the problem, or even it trying again—it is structuring a program for recovery, making errors stand out, allowing other parts of the program to make decisions. Techniques like fail-fast, crash-only-software, process supervision, but also things like clever use of version numbers, and occasionally the odd bit of statelessness or idempotence. What these all have in common is that they’re all methods of recovery. Recovery is the secret to handling errors. Especially at scale. Giving up early so other things have a chance, continuing on so other things can catch up, restarting from a clean state to try again, saving progress so that things do not have to be repeated. That, or put it off for a while. Buy a lot of disks, hire a few SREs, and add another graph to the dashboard.
Modern Data Practice and the SQL Tradition: 15 mins read. Over the last one year I have read multiple posts suggesting we should start with relational database route. SQL is becoming the defacto language for all things data. Most developers start looking at alternatives too early in the cycle before understanding pros and cons of using a technology. The key points from the post are:
The more I work with existing NoSQL deployments however, the more I believe that their schemaless nature has become an excuse for sloppiness and unwillingness to dwell on a project’s data model beforehand.
One can now model the “known” part of his data model in a typical relational manner and dump his “raw and unstructured” data into JSON columns. No need to “denormalize all the things” just because some element of the domain is “unstructured”.
The good thing with this approach is that one can have a single database for both their structured and unstructured data without sacrificing ACID-compliance.
SQL and relational databases have come a long way and nowadays offer almost any function a data scientist could ask.
Relational databases usually make more sense financially too. Distributed systems like MongoDB and ElasticSearch are money-hungry beasts and can kill your technology and human resources budget; unless you are absolutely certain and have run the numbers and decided that they do really make sense for your case.
Performance and stability with relational databases can be better out of the box
Hash join in MySQL 8: 10 mins read. You should read this blog if you want to learn how hash joins are implemented by databases. It will give you a good and detailed understanding on the subject.
Managing a Go monorepo with Bazel: 10 mins read. I don’t think we still have a winner between monorepo and multiple repo approach when building Microservices. We have big organisations like Google and Facebook that prefer Monorepo approach and then we have organizations like Netflix that recommend multi repo approach. This post covers how you can manage a Go monorepo using Bazel build tool. I have not used Bazel so far but I am seriously considering it for my personal projects.
The Value in Go’s Simplicity: 10 mins read. Go is one language that I really want to spend more time on. It is a popular language used almost everywhere these days. In this blog, author makes the case for Go’s simplicity. As author mentioned, Go core development team has take simplicity to another level. To keep language simple they are not allowing many good features like Generics implemented in Go.
When XML beats JSON: UI layouts: 5 mins read. UI layouts are represented as component trees. And XML is ideal for representing tree structures. It’s a match made in heaven! In fact, the most popular UI frameworks in the world (HTML and Android) use XML syntax to define layouts.
A Technical Introduction to MemSQL: 20 mins read. MemSQL is s fast, commercial, ANSI SQL compliant, highly scalable HTAP database. HTAP databases are those that support both OLTP and OLAP workloads. It supports ACID transactions just like a regular relational database .It also supports document and geospatial data types. I have also written a quick post on MemSQL that you can read.
It’s later than you think: 20 mins read. We all regret working too hard in the end. Give it a read it is an awesome write up on a heart breaking story.
Modern applications at AWS: 10 mins read. To succeed in using application development to increase agility and innovation speed, organizations must adopt five elements, in any order: microservices; purpose-built databases; automated software release pipelines; a serverless operational model; and automated, continuous security.
1 Year of Event Sourcing and CQRS: 30 mins read. This is a long read that covers DDD, CQRS, and Event Sourcing. In this post author covered how they implemented this architecture style and issues they faced.
The Single Most Important Internal Email in the History of Amazon: 20 mins read. This is a long read on how different organisations are organised. Some organisations are collocated and prefer synchronous mode of communication while others are distributed with asynchronous mode of communications. An organization’s communication system can be one of the most important leverages you can have to make an impact on productivity. Be very intentional about it.
> KeyDB has a different philosophy on how the codebase should evolve. We feel that ease of use, high performance, and a “batteries included” approach is the best way to create a good user experience. While we have great respect for the Redis maintainers it is our opinion that the Redis approach focusses too much on simplicity of the code base at the expense of complexity for the user. This results in the need for external components and workarounds to solve common problems.
Why we decided to go for the Big Rewrite: 20 mins. This post goes into detail how channable did rewrite of their main data processing system. It has a lot of good advice that you can apply in your work as well.
How to Write Fast Code in Ruby on Rails: 15 mins read. This post contains general advice to write fast and performant Ruby code. Many of the lessons can be applied even if you use any other programming language.
Cascading Cache Invalidation: 25 mins read. This is an interesting article covering flaw in one of the best practice most people use for asset caching i.e content hashes in filenames and far-future expiry. Author also shared three possible solutions to the problem.
Video of the week
This week video: Intel and Rust: the Future of Systems Programming
A busy calendar and a busy mind will destroy your ability to create anything great. – Naval Ravikant
GitHub stars won’t pay your rent: 20 mins read. The key point in the post is that you should not feel bad about charging money for your work. I think we software developers have taken it too far. Most of us feel that by making our work open source we are making the world better. But, the reality is that if you loose your job and need financial support then no user of your open source project will come to help. We need to become practical and keep financial reality in mind.
The (not so) hidden cost of sharing code between iOS and Android: 10 mins read. So, we have come back the full circle. Organisations are moving away from code sharing approach when building same application for different mobile platforms. I have seen multiple organisations using C++ to write share code. The use of C++ limits number of developers you can find in the market and overall slows you down. You have to build tools to support your custom journey.
Know what people are asking you to be an expert in. This helps you avoid getting too much into other people territory.
Thinking is also work. This is especially true when you move to management.
Effective teams need trust. That’s not to say that frameworks for decision making or metrics tracking are not useful, they are critical — but replacing trust with process is called bureaucracy.
Fast and flexible observability with canonical log lines – 20 mins read. Canonical logging is a simple technique where in addition to their normal log traces, requests also emit one long log line at the end that includes many of their key characteristics. The key points for me in this post are:
Use logfmt to make logs machine readable
We use canonical log lines to help address this. They’re a simple idea: in addition to their normal log traces, requests (or some other unit of work that’s executing) also emit one long log line at the end that pulls all its key telemetry into one place.
Canonical lines are an ergonomic feature. By colocating everything that’s important to us, we make it accessible through queries that are easy for people to write, even under the duress of a production incident
Why Some Platforms Thrive and Others Don’t – 25 mins read. When evaluating an opportunity involving a platform, entrepreneurs (and investors) should analyze the basic properties of the networks it will use and consider ways to strengthen network effects. It’s also critical to evaluate the feasibility of minimizing multi-homing, building global network structures, and using network bridging to increase scale while mitigating the risk of disintermediation. That exercise will illuminate the key challenges of growing and sustaining the platform and help businesspeople develop more-realistic assessments of the platform’s potential to capture value
Re-Architecting the Video Gatekeeper – 15 mins read. This post cover how one of the Netflix tech team used Hollow to improve performance of their service. Hollow is a total high-density cache built by Netflix. The post covers why near cache was suitable for their use case. This is a detailed post covering the existing and new architecture. I learnt a lot while reading this article.
Application architecture evolve over time. The right way to think about evolution is to go from Monolith -> Modular monolith -> Microservices.
Monolithic architecture has many advantages.
Monolithic architecture can take an application very far since it’s easy to build and allows teams to move very quickly in the beginning to get their product in front of customers earlier.
You’ll only need to maintain one repository, and be able to easily search and find all functionality in one folder.
It also means only having to maintain one test and deployment pipeline, which, depending on the complexity of your application, may avoid a lot of overhead.
One of the most compelling benefits of choosing the monolithic architecture over multiple separate services is that you can call into different components directly, rather than needing to communicate over web service API’s
Disadvantages of Monolithic architecture
As system grows challenge of building and testing new features increases
High coupling and a lack of boundaries
Developing in Shopify required a lot of context to make seemingly simple changes. When new Shopifolk onboarded and got to know the codebase, the amount of information they needed to take in before becoming effective was massive
Microservices architecture increases deployment and operational complexity. The tools that works great for monolithic code bases stop working with Microservices architecture.
A modular monolith is a system where all of the code powers a single application and there are strictly enforced boundaries between different domains.
Approach to move to Modular monolith
Reorganize code by real-world concepts and boundaries
Ensure all tests work after reorganisation
Build tools that help track progress of each component towards its goal of isolation. Shopify developed a tool called Wedge that highlights any violations of domain boundaries (when another component is accessed through anything but its publicly defined API), and data coupling across boundaries
According to Martin Fowler, “almost all the cases where I’ve heard of a system that was built as a microservice system from scratch, it has ended in serious trouble… you shouldn’t start a new project with microservices, even if you’re sure your application will be big enough to make it worthwhile
A postmortem is the process by which we learn from failure, and a way to document and communicate those lessons.
Why to write one?
It allows us to document the incident, ensuring that it won’t be forgotten.
They are the most effective mechanism we can use to drive improvement in our infrastructure.
You should share postmortems because your customers deserve to know why their services didn’t behave as expected
We shouldn’t be satisfied with identifying what triggered an incident (after all, there is no root cause), but should use the opportunity to investigate all the contributing factors that made it possible, and/or how our automation might have been able to prevent this from ever happening.
What we want is to learn why our processes allowed for that mistake to happen, to understand if the person that made a mistake was operating under wrong assumptions.
We change our behavior when the pain of staying the same becomes greater than the pain of changing. Consequences give us the pain that motivates us to change. — Henry Cloud
Advertising is a cancer on society: 20 mins read. This is a long read. Author makes many valid points on why Advertising should be consider cancer. Advertising is a cancer because it has symptoms mentioned below.
Announcing PartiQL: One query language for all your data – 20 mins read. Looks like finally we have found a way to standardise on SQL for working across different data storage solutions be it RDBMS or NoSQL or File based. PartiQL extends SQL by adding minimal extensions required for working with different data models. SQL won! Like it or not SQL is still the best and most powerful query language.
Parallelism in PostgreSQL – 15 mins read. The post covers how modern Postgres implements parallelism for sequential scans, aggregations, and B-tree scans.
Who Actually Feels Satisfied About Money? – 20 mins read. The post makes a good point on anxiety people have regarding money. More money does not always translate to more happiness. It’s not just how much you have — it’s what you do with it.
Top Seven Myths of Robust Systems – 15 mins read. The number one myth we hear out in the field is that if a system is unreliable, we can fix that with redundancy. In some cases, redundant systems do happen to be more robust. In others, this is demonstrably false. It turns out that redundancy is often orthogonal to robustness, and in many cases it is absolutely a contributing factor to catastrophic failure. The problem is, you can’t really tell which of those it is until after an incident definitively proves it’s the latter.
Safely Rewriting Mixpanel’s Highest Throughput Service in Golang – 15 mins read. This post covers how Mixpanel made use of Diffy to safely migrate high throughput service from Python to Golang. Diffy is a service that accepts HTTP requests, and forwards them to two copies of an existing HTTP service and one copy of a candidate HTTP service.
Stateful data is hard. Don’t try to reinvent AWS RDS. Stateful sets have limitations.
Upgrading Kubernetes is hard. The advice is to run more than Kubernetes cluster in production.
Managed Kubernetes does not take away all the problems.
When a rewrite isn’t: rebuilding Slack on the desktop – 15 mins read. The approach used was at once incremental and all-encompassing, rewriting a piece at a time into a gradually growing “modern” section of the application that utilized React and Redux. And the results? 50% reduction of memory use and 33% improvement in load time
Video of the week
In a large engineering organization writing is the only medium that will help you propagate your message forward.
You can learn to improve your writing skills. It is a learnable skill.
Writing code is not the only activity in software development.
When you write things down, you build better understanding of the topic. I personally find that writing helps me think clearly about a problem.
Why Github used Haskell for Semantic? – 20 mins read. We need more such posts from the community. These type of posts can help developers understand how organizations take technical choices. The key lessons for me in this post are:
The problem Github is trying to solve with Semantic is related to domain of programming language theory. This domain is an active research area and most of the researchers in this domain use Haskell for its brevity, power, and focus on correctness. Writing in Haskell allows us to build on top of the work of others rather than getting stuck in a cycle of reading, porting, and bug-fixing
Haskell makes it nigh-impossible to build programs that contain such bugs
Haskell used in industry at scale. Facebook open sourced a project called Haxi that is written in Haskell.
Let’s build a SQL parser in Go!: 20 mins read. I enjoyed reading this post. It shows in a step by step manner how to write a SQL parser. The author implemented it in Go.
You can only fix a problem if you can successfully reproduce in your local environment. I know this sound common sense but ask yourself honestly how many times you have tried solutions without reproducing the problem in a local environment only to discover that your proposed solution does not work. This happened to me this week.
Composite index in PostgreSQL can help you avoid heap sort if you do order by in your query.
First thing you need to do is to get out of the denial mode. Mental issues can happen with anyone. Setup a weekly wellbeing check-up with yourself.
Create a schedule and stick to it. Know when to stop and detach from work.
Setup a separate remote working space.
Limit your digital life. Talk to people
Amdahl’s law: 10 mins read. Amdahl’s Law is a formula which gives the theoretical speedup in latency of the execution of a task at fixed workload that can be expected of a system whose resources are improved. In essence, it says you can’t speed up beyond the sequential part of your program irrespective of how many cores you add. Gene Amdahl’s said, If 50% of the execution time is sequential, the maximum speed up is 2, no matter how many cores you use. Good video that you can watch on Amdahl’w Law is by Professor Ben H. Juurlink.
and their understanding of timeline of events as they occurred.
Love DevOps? Wait until you meet SRE: 10 mins read. If you have not heard about SRE then this post will help you get started. SRE as defined by its mastermind Ben Treynor is “SRE is what happens when a software engineer is tasked with what used to be called operations”.
Google’s Chrome Becomes Web ‘Gatekeeper’ and Rivals Complain: 15 mins read. I have read this multiple times. Chrome is at the core of Google’s digital strategy. Google needs to track us to show ads and make money. This is the reason they are coming up with updated Chrome Extension API that will limit what ad blockers can do. In my view, the big problem is not Chrome or Google. We have ads because people want to earn money from their content. Google does not put ads magically; site owners add Google ad tracking scripts that share information with Google. Till the time, we don’t create a better financial model for content creators. This problem can’t be solved. Brave browser by Brendan Eich, co-founder of Mozilla and the current CEO of Brave Software Inc. is trying to do some work on it but it is still early days for it.
Flaky tests are useful at finding underlying flaws in our application. In some cases when fixing a flaky test, the fix is in the app, not in the test
Common patterns of flaky tests
Flaky tests caused by hard coded ids because they rely on database sequences
Making bad assumptions about DB ordering. Result returned by SQL query is unordered.
Incorrect assumptions about time
Bad assumptions about the environment
Run test suite in a tight loop, over and over again on a cloud server. Each time tests fail we flag them and at the end of a week of continuous running we mark flaky specs as “skipped” pending repair.
One big issue with flaky tests is that quite often they are very hard to reproduce. To accelerate a repro I tend to try running a flaky test in a loop.
Invest in fast test suite
Add purpose built diagnostic code to debug flaky tests you can not reproduce
You need neither PWA nor AMP to make your website load fast: 10 mins read. Author writes, “why was AMP needed? Well, basically Google needed to lock content providers to be served through Google Search. But they needed a good cover story for that. And they chose to promote it as a performance solution”. I kind of agree with author that AMP hurts the web community more than it helps. I have disabled AMP in my blog.
Fast key-value stores: An idea whose time has come and gone: 30 mins read. Interesting paper by Google on building stateful services instead of stateless. I also went with stateful service architecture in my last application. It has its own challenges but in some cases it is the only viable option.
With Lambda you pay per invocation and the price is based on the memory you allocate for your function (up to 30GB) and its execution time. The amount of compute available to your Lambda function is based on it’s memory allocation. This pricing model is ideal for workloads that have spikes and/or long periods of downtime.
Fargate, on the other hand, lets you configure how many VCPUs (up to 8) and GBs of memory (up to 3GB) you want your Fargate tasks to have independently, priced by the secondrounded up to one minute.
Learning to Listen to one’s own Boredom: 15 mins read. All of us need to learn to develop a ‘late style’ – ideally as early on in our lives as possible: a way of being wherein we shake off the dead hand of habit and social fear and relearn to listen to what entertains us
Do every act of your life as though it were the very last act of your life – Marcus Aurelius
Learn more programming languages, even if you won’t use them: 10 mins read. I first got this advice few years back when I watched a two minute video by Bjarne Stroustrup, creator of C++. He recommended you should not call yourself programmer if you only know one programming language. The magic number he mentioned in the video was 5. This post also makes the same point. Different programming languages are good at different things. Every programming language makes a tradeoff. They help you think about a problem in different way. I got hang of functional programming once I learnt Scheme basics. I try to learn a new programming language every couple of years. I need to start using them in my side projects.
Announcing AMP Real URL: 20 mins read. In case you are not aware, AMP stands for Accelerated Mobile Pages. AMP is an open source standard led by Google that helps speed up access to websites by caching the content near to user. This is good for readers but for content producers there were few issues. The biggest issue with AMP is that rendered webpage has a URL starting with https://google.com/amp/. Users have become used to looking at the navigation bar in a web browser to see what web site they are visiting. The AMP cache breaks that experience. In this post by Cloudflare folks authors talk about how they fixed the real origin URL problem with AMP using web packaging and Cloudflare workers.
Infrastructure as Code, Part One: 15 mins read. This is an introductory read on infrastructure as code. If you are not aware of it then you should give it a read. It is a nicely written introduction to IaC.
When rules don’t apply: 30 mins read. This is a 30 mins video that talks about how executives at Apple, Google, eBay, Intuit, and other big tech companies conspire against their own employees by secretly agreeing among themselves not to hire each other employees. Tech companies treat their employees as their assets and cheat them.
You should different compute services based on the use case. The post talks about why author used both AWS Lambda and AWS Fargate. For short computation jobs use lambda and for long compute jobs that have no designated end use AWS Fargate.
Give a thought on isolation model when deciding which compute service to use. AWS Lambda compute instances are isolated from each other so if one rogue your application will not suffer.
When you are building a side project or building MVP for your startup your goal should be to minimise maintenance and operation tasks. Serverless services help you do that.
AWS CDK is a service that allows you to write IaC in your own preferred language.
Thundering Herds & Promises: 10 mins read. I love this kind of posts which share how team solved a real-world technical problem. This post covers how Instagram solved the thundering herd problem with their cache using the promises. The below explains explains what thundering herd problem means
> If your cache is hit with 100 concurrent requests, then, since the cache is empty, all of them will get a cache-miss at the one moment, resulting in 100 individual requests to the backend. If the backend is unable to handle this surge of concurrent requests (ex: capacity constraints), additional problems arise. This is what’s sometimes called a thundering herd.
The Good and the Bad of Google Cloud Run: 10 mins read. The key point made in this post is that Google Cloud Run is not FaaS. Google Cloud Run allows developers to push container images with HTTP server to GCP and GCP takes care of running them at scale. If you have build pure serverless application you will know that pure serverless apps architecture is event-driven service-full architecture. This forces developers to think about applications in a different way. According to author, Cloud Run is providing a safety blanket for developers intimidated by the paradigm shift of FaaS and service-full architecture.
Try to do a small quiz on what you have learnt or try to explain it to someone
I also apply similar technique in my newsletter by summarising what I have learnt from a post in my own words.
An Overview of Go’s Tooling: 30 mins read. This is the post that you should bookmark if you are a Go developer. The post covers most the Go tools a developers need to interact with. I wish more such posts should be written for other languages as well.