A couple of months back I watched a video by Andy Pavlo, Associate Professor of Databases Carnegie Mellon, where he made a point that databases should not use mmap. He went on to say that if there is only one thing you should get from his database course is to never use mmap when building and designing database management systems. I have not used mmap before so I was intrigued to understand it in more detail. I was aware that MongoDB used to use an mmap based storage engine. It allowed them to achieve faster time to market but later they had to replace it with a new storage engine wiredtiger because of the issues they faced with mmap. MongoDB is not the only database that uses mmap. There are many databases that use mmap. Some of the databases that use mmap are RavenDB, ElasticSearch, LevelDB, InfluxDB, LMDB, BoltDB, moss (key-value store from Couchbase), etc.
Given that so many databases use mmap I wanted to understand why Andy recommended us to not use mmap. I will list all of the reasons I could find in my research and from Andy’s video in this post. But, before we do that let’s first understand mmap.
Here are 10 posts I thought were worth sharing this week.
#1. What we learnt by migrating from CircleCI to Buildkite – Link
This post covers how and why Hasura switched their CI service from CircleCI to Buildkite. They started by defining the requirements from their CI, then they evaluated different solutions, and finally introduced it in their ecosystem. Their main reason to switch CI service was cost. They reduced the cost by 50%. This required them to own some of the aspects of the CI operations. A couple of interesting things I learnt from this post:
Use of labels to trigger build. They used them to save costs.
The use of dynamic configuration. They wrote their build code in a Go program. This saved them from YAML hell. Interestingly, they use shellcheck t static analysis of shell script
It is all about perspective. Tech Debt brings negative emotions in people and it becomes difficult to sell it to higher management. In this post, the author suggests we reframe tech debt as tech wealth while communicating with stakeholders. Building tech wealth means getting more value out of the software we’re creating, as well as our efforts to develop and maintain it. Author suggests two ways we can plan for tech wealth:
Allocate time within each planning cycle
Dedicate the last few cycles in a quarter
In one of the products I worked on we used to schedule 1 day per sprint (2 weeks) for paying tech debt. We had sprint demo every alternate Thursday and the next day i.e. Friday was scheduled for working on tech debt items. One problem with 1 day every sprint is that bigger items can’t be handled. We used to create stories for them and pick them as part of the sprint backlog.
Today, I was doing solution design for a system when I started to think when we should use JSON data type for columns. Coming up with the right schema design takes multiple iterations. I consider it more as an art than science. All the mainstream RDBMS have JSON data type support.
Postgres has JSON data type since version 9.2. The 9.2 version was released in September 2012
MySQL has JSON data type since version 5.7.8. The 5.7.8 version was released in August 2015
SQL Server has JSON data type since version SQL Server 2016. The 2016 version was released in June 2016
Oracle has JSON data type since version 19c. The 19c version was released in February 2019
They all support efficient insertion and querying of JSON data type. I will not compare their JSON capabilities today. Today, I want to answer a design question – when should a column have a JSON data type?
I use the JSON data type in design situations mentioned below. There could be other places as well where JSON is a suitable data type.
Dump request data that will be processed later
Support extra fields
One To Many Relationship where many side will not have to its own identity
Key Value use case
Simpler EAV design
Let’s talk about each of these use cases in more detail.
There are many reasons why software projects fail. In this post I will cover one of the main reasons I think outsource product development fails to deliver the right product at the right time. The reason is that customers outsource product management as well. They think their job is done after sharing the wireframes. These wireframes are typically created by a third party design agency. Customer product team usually works closely with the design agency. They will usually call this an MVP. The only thing they get from the whole MVP concept is that it needs to be delivered faster to the end customer. They completely ignore the minimal part. I usually see MVPs with more than 500 screens. These do not include failure states. I know screens are not the right measure of the application complexity but during the proposal phase this is the max you will get.
For the last couple of weeks I have been going over articles and videos in the Amazon Builder library. They cover useful patterns that Amazon uses to build and operate software. Below are the important points I captured while going over the material.
Reliability, constant work, and a good cup of coffee – Link
Amazon systems strive to solve problems using reliable constant work patterns. These work patterns have three key features:
One, they don’t scale up or slow down with load or stress.
Two, they don’t have modes, which means they do the same operations in all conditions.
Three, if they have any variation, it’s to do less work in times of stress so they can perform better when you need them most.
There are not many problems that can be efficiently designed using constant work patterns.
For example, If you’re running a large website that requires 100 web servers at peak, you could choose to always run 100 web servers. This certainly reduces a source of variance in the system, and is in the spirit of the constant work design pattern, but it’s also wasteful. For web servers, scaling elastically can be a better fit because the savings are large. It’s not unusual to require half as many web servers off peak time as during the peak.
Based on the examples given in the post it seems that a constant work pattern is suitable for use cases where system reliability, stability, and self-healing are primary concerns. It is fine if the system does some wasteful work and costs more. These are essential concerns for systems which others use to build their systems on. I think control plane systems fall under this category. The example of such a system mentioned in the post is a system that applies configuration changes to foundational AWS components like AWS Network load balancer. The solution can be designed using both the push and pull based approach. The pull based constant work pattern approach lends to a simpler and reliable design.
Although not mentioned in the post, constant work that the system is doing should be idempotent in nature.
This is an amazing read. Etsy engineer Salem Hilal shares their ES6 to Typescript journey. In this post, he covers the strategy, technical challenges they faced, tooling they built, and how they educated their engineers to write effective Typescript code. Etsy has been built over the last 16 years and they had 17000 JS files. Migrating such a codebase is a multi year effort. You need to have a clear plan and ensure there are no tail migration issues.
A couple of months back a customer wanted us to migrate their 20+ TB Oracle database to Postgres. They had hundreds of stored procedures written in Oracle. Also, their batch processing jobs were written in stored procedures. They wanted to do the complete migration in a couple of months. We politely told them it is not possible. They went with another vendor that said they could do it in two months. Migrations are very risky. There are so many unknowns involved. For a vendor it is much more difficult because they don’t even understand your functional requirements and code base. For migrations I prefer to be safe than sorry.
Little’s law states that the long-term average number L of customers in a stationary system is equal to the long-term average effective arrival rate λ multiplied by the average time W that a customer spends in the system.
L = Average number of customers in a stationary system λ = Average arrival rate in the system W = Average time a customer spend in the system
In context of an API it means: L = Average number of concurrent requests system can serve λ = Average arrival rate of requests in the system W = Average latency of each request
Here are 11 posts I thought were worth sharing this week.
To Learn a New Language, Read Its Standard Library – Link
I like the idea of learning a new language by reading its standard library. You learn the idiomatic way of writing code in a language by reading source code written by its original authors. I am planning to learn Rust. I will also give this approach a try. There are two limiting factors when you might struggle with this approach 1) Poor documentation 2) when the standard library is implemented in a lower level language.
In this post, Subbu Allamaraju shares his thoughts on how you can be both a nice and effective leader . He talks about six different leadership styles and how those leadership styles create positive and negative climates. I am in my first engineering leadership role and still figuring out my leadership style. Based on my limited leadership experience I think a leader can have multiple leadership styles depending on the situation. There are times you have to course correct and change your leadership style based on the situation and context. Also, I think leaders can be “nice” and “not nice” depending on the context. Leadership is hard.
42 things I learned from building a production database – Link
Not a deep technical post. Many useful pieces of advice by Mahesh Balakrishnan in this post. He worked on a Chubby like system at Facebook. My favorites:
Be conservative on APIs and liberal with implementations
When designing APIs, write code for one implementation; plan actively for the second implementation; and hope/pray that things will work for a third implementation.
Anything that can’t be measured easily (e.g., consistency) is often forgotten; pay particular attention to attributes that are difficult to measure