This week I was talking to a developer about how to think about data models supported by different databases. One thing that I have learnt in my 15 years of building web applications is that data models play an important role in the success of any software application.
Data model provides an abstract model to organise elements of data and how these elements relate to each other. They describe the structure, manipulation and integrity aspects of the data stored in data management systems such as relational databases. For example, when you are modelling your problem domain in a relational database then you think in terms of real-world entities and how those entities are related with each other. We usually use Entity Relation (ER) diagrams to model tables in relational databases. In this post, we will focus on two popular data models — Relational and Document. We will discuss when you will use one over the other.
Data model influence two main attributes:
- Easy of use: A data model can make some operations easy to achieve and others difficult or impossible
- Performance: A data model can be suitable for faster reads but slower writes or vice versa
MemSQL is s fast, commercial, ANSI SQL compliant, highly scalable HTAP database. HTAP databases are those that support both OLTP and OLAP workloads. It supports ACID transactions just like a regular relational database .It also supports document and geospatial data types.
MemSQL is fast because it stores data in-memory. But, it does not mean it is not durable. It maintains a copy of data on disk as well. Transactions are committed to the transaction log on disk and later compressed into full-database snapshots. One of the main reason new databases are designed as in-memory first is because memory is getting cheaper every year. It is estimated memory is becoming cheaper 40% every year.
MemSQL has tuneable durability. You can make it fully durable or completely ephemeral. It can be sync or async.
MemSQL simplifies your architecture as you don’t have to write ETL jobs to move data from one data store to another data store. This is the biggest selling point of any HTAP database.
I enjoy working through system design problems. It helps me think how I will design interesting features of various systems. I will post design solutions to interesting problems.
Today, I will share how I will design Amazon recently viewed items page. You can view this page by going to https://www.amazon.com/gp/history/
To me it showed last 73 items I viewed on Amazon.com. I don’t think they are showing last N items rather they are showing items that I viewed in last X days(or months) with some max limit.
Let’s redefine problem now that we better understand it.
Design the Amazon recently viewed items page API. The recently viewed items are all the items that you viewed in the last 6 months. The max count of items could be 100.
The time to read this newsletter is 130 minutes.
A busy calendar and a busy mind will destroy your ability to create anything great. – Naval Ravikant
- GitHub stars won’t pay your rent: 20 mins read. The key point in the post is that you should not feel bad about charging money for your work. I think we software developers have taken it too far. Most of us feel that by making our work open source we are making the world better. But, the reality is that if you loose your job and need financial support then no user of your open source project will come to help. We need to become practical and keep financial reality in mind.
- Building a Kubernetes platform at Pinterest: 15 mins read. A lot of things you can learn about Kubernetes from this post by Pinterest engineering team. The key points for me are:
- You can use CRD to define your organisation specific service. Look at
- CRD can be used as an alternative to Helm
- Infrastructure team has three main priorities: 1) Service Reliability 2) Developer Productivity 3) Infra Efficiency
- Six Shades of Coupling: 15 mins read.
- When Redundancy Actually Helps: 10 mins read.
- The (not so) hidden cost of sharing code between iOS and Android: 10 mins read. So, we have come back the full circle. Organisations are moving away from code sharing approach when building same application for different mobile platforms. I have seen multiple organisations using C++ to write share code. The use of C++ limits number of developers you can find in the market and overall slows you down. You have to build tools to support your custom journey.
- 3 Strategies for implementing a microservices architecture: 5 mins read. The three strategies
- The Strangler method
- The Lego strategy
- The nuclear option
- Microservices, Apache Kafka, and Domain-Driven Design: 20 mins read.
- Habits vs. Goals: A Look at the Benefits of a Systematic Approach to Life: 10 mins read.
- Building an analytics stack from scratch: 15 mins read.
- Cutting Through Indecision & Overthinking: 10 mins read. Take action. Half the battle is won if you get started.
Video of the week
One of the mental model that I find useful for prioritising my todo list is Eisenhower matrix named after US president Dwight D. Eisenhower. Dwight D. Eisenhower once said
I have two kinds of problems, the urgent and the important. The urgent are not important, and the important are never urgent.
Eisenhower matrix is a simple decision making tool for organising your tasks. You can use it to find the task you should act on first.
Today, I was looking at JDK 8
Collections.max function declaration and noticed a weird
& in the type declaration. Most normal Java developers will not remember exact function declaration so I am writing it below.
public static <T extends Object & Comparable<? super T>> T max(Collection<? extends T> coll)
The time to read this newsletter is 210 minutes.
The general who wins a battle makes many calculations in his temple before the battle is fought. – Sun Tzu
- All the best engineering advice I stole from non-technical people – 20 mins read. The points that resonated with me:
- Know what people are asking you to be an expert in. This helps you avoid getting too much into other people territory.
- Thinking is also work. This is especially true when you move to management.
- Effective teams need trust. That’s not to say that frameworks for decision making or metrics tracking are not useful, they are critical — but replacing trust with process is called bureaucracy.
- Fast and flexible observability with canonical log lines – 20 mins read. Canonical logging is a simple technique where in addition to their normal log traces, requests also emit one long log line at the end that includes many of their key characteristics. The key points for me in this post are:
- Use logfmt to make logs machine readable
- We use canonical log lines to help address this. They’re a simple idea: in addition to their normal log traces, requests (or some other unit of work that’s executing) also emit one long log line at the end that pulls all its key telemetry into one place.
- Canonical lines are an ergonomic feature. By colocating everything that’s important to us, we make it accessible through queries that are easy for people to write, even under the duress of a production incident
- Why Some Platforms Thrive and Others Don’t – 25 mins read. When evaluating an opportunity involving a platform, entrepreneurs (and investors) should analyze the basic properties of the networks it will use and consider ways to strengthen network effects. It’s also critical to evaluate the feasibility of minimizing multi-homing, building global network structures, and using network bridging to increase scale while mitigating the risk of disintermediation. That exercise will illuminate the key challenges of growing and sustaining the platform and help businesspeople develop more-realistic assessments of the platform’s potential to capture value
- How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh – 30 mins read. A long read that makes case for distributed data mesh. It applies DDD principles to designing a data lake. A refreshing take on how to design date lakes.
- Re-Architecting the Video Gatekeeper – 15 mins read. This post cover how one of the Netflix tech team used Hollow to improve performance of their service. Hollow is a total high-density cache built by Netflix. The post covers why near cache was suitable for their use case. This is a detailed post covering the existing and new architecture. I learnt a lot while reading this article.
- Our not-so-magic journey scaling low latency, multi-region services on AWS – 20 mins read. This is another detailed post covering how Atlassian built a low latency service. They first tried DynamoDB but that didn’t cut for them. So, they also use a Caffeine based near cache to achieve the numbers expected from their service.
- Making Containers More Isolated: An Overview of Sandboxed Container Technologies – 25 mins read.
- Benchmarking: Do it with Transparency or don’t do it at all – 20 mins read. This is a detailed rebuttal by Ongress team on MongoDB blog where they dismissed the benchmark report created by Ongress.
- Deconstructing the Monolith: Designing Software that Maximizes Developer Productivity – 20 mins read. This article by Shopify is a must read for anyone planning to adopt Microservices architecture. It is practical and pragmatic. The key points for me in this post are:
- Application architecture evolve over time. The right way to think about evolution is to go from Monolith -> Modular monolith -> Microservices.
- Monolithic architecture has many advantages.
- Monolithic architecture can take an application very far since it’s easy to build and allows teams to move very quickly in the beginning to get their product in front of customers earlier.
- You’ll only need to maintain one repository, and be able to easily search and find all functionality in one folder.
- It also means only having to maintain one test and deployment pipeline, which, depending on the complexity of your application, may avoid a lot of overhead.
- One of the most compelling benefits of choosing the monolithic architecture over multiple separate services is that you can call into different components directly, rather than needing to communicate over web service API’s
- Disadvantages of Monolithic architecture
- As system grows challenge of building and testing new features increases
- High coupling and a lack of boundaries
- Developing in Shopify required a lot of context to make seemingly simple changes. When new Shopifolk onboarded and got to know the codebase, the amount of information they needed to take in before becoming effective was massive
- Microservices architecture increases deployment and operational complexity. The tools that works great for monolithic code bases stop working with Microservices architecture.
- A modular monolith is a system where all of the code powers a single application and there are strictly enforced boundaries between different domains.
- Approach to move to Modular monolith
- Reorganize code by real-world concepts and boundaries
- Ensure all tests work after reorganisation
- Build tools that help track progress of each component towards its goal of isolation. Shopify developed a tool called Wedge that highlights any violations of domain boundaries (when another component is accessed through anything but its publicly defined API), and data coupling across boundaries
- According to Martin Fowler, “almost all the cases where I’ve heard of a system that was built as a microservice system from scratch, it has ended in serious trouble… you shouldn’t start a new project with microservices, even if you’re sure your application will be big enough to make it worthwhile
- “It’s dead, Jim”: How we write an incident postmortem – 15 mins read. I believe it is a good exercise to do a post mortem even if you don’t follow SRE practices. The key points for me in this post are:
- A postmortem is the process by which we learn from failure, and a way to document and communicate those lessons.
- Why to write one?
- It allows us to document the incident, ensuring that it won’t be forgotten.
- They are the most effective mechanism we can use to drive improvement in our infrastructure.
- You should share postmortems because your customers deserve to know why their services didn’t behave as expected
- We shouldn’t be satisfied with identifying what triggered an incident (after all, there is no root cause), but should use the opportunity to investigate all the contributing factors that made it possible, and/or how our automation might have been able to prevent this from ever happening.
- What we want is to learn why our processes allowed for that mistake to happen, to understand if the person that made a mistake was operating under wrong assumptions.