Key Insights From Amazon Builder Library

For the last couple of weeks I have been going over articles and videos in the Amazon Builder library. They cover useful patterns that Amazon uses to build and operate software. Below are the important points I captured while going over the material.

  1. Amazon systems strive to solve problems using reliable constant work patterns. These work patterns have three key features:
    • One, they don’t scale up or slow down with load or stress. 
    • Two, they don’t have modes, which means they do the same operations in all conditions. 
    • Three, if they have any variation, it’s to do less work in times of stress so they can perform better when you need them most.
  2. There are not many problems that can be efficiently designed using constant work patterns. 
    • For example, If you’re running a large website that requires 100 web servers at peak, you could choose to always run 100 web servers. This certainly reduces a source of variance in the system, and is in the spirit of the constant work design pattern, but it’s also wasteful. For web servers, scaling elastically can be a better fit because the savings are large. It’s not unusual to require half as many web servers off peak time as during the peak. 
  3. Based on the examples given in the post it seems that a constant work pattern is suitable for use cases where system reliability, stability, and self-healing are primary concerns. It is fine if the system does some wasteful work and costs more. These are essential concerns for systems which others use to build their systems on. I think control plane systems fall under this category. The example of such a system mentioned in the post is a system that applies configuration changes to foundational AWS components like AWS Network load balancer. The solution can be designed using both the push and pull based approach. The pull based constant work pattern approach lends to a simpler and reliable design. 
  4. Although not mentioned in the post, constant work that the system is doing should be idempotent in nature.
Continue reading “Key Insights From Amazon Builder Library”

Useful Stuff I Read This Week

Here are 7 posts I thought were worth sharing this week.

This is an amazing read. Etsy engineer Salem Hilal shares their ES6 to Typescript journey. In this post, he covers the strategy, technical challenges they faced, tooling they built, and how they educated their engineers to write effective Typescript code. Etsy has been built over the last 16 years and they had 17000 JS files. Migrating such a codebase is a multi year effort. You need to have a clear plan and ensure there are no tail migration issues. 

A couple of months back a customer wanted us to migrate their 20+ TB Oracle database to Postgres. They had hundreds of stored procedures written in Oracle. Also, their batch processing jobs were written in stored procedures. They wanted to do the complete migration in a couple of months. We politely told them it is not possible. They went with another vendor that said they could do it in two months. Migrations are very risky. There are so many unknowns involved. For a vendor it is much more difficult because they don’t even understand your functional requirements and code base. For migrations I prefer to be safe than sorry.

Continue reading “Useful Stuff I Read This Week”

Understanding Little’s Law

Little’s law states that the long-term average number L of customers in a stationary system is equal to the long-term average effective arrival rate λ multiplied by the average time W that a customer spends in the system.

{\displaystyle L=\lambda W.}

Where

L = Average number of customers in a stationary system
λ = Average arrival rate in the system
W = Average time a customer spend in the system

In context of an API it means:
L = Average number of concurrent requests system can serve
λ = Average arrival rate of requests in the system
W = Average latency of each request

Continue reading “Understanding Little’s Law”

Useful Stuff I Read This Week

Here are 11 posts I thought were worth sharing this week.

I like the idea of learning a new language by reading its standard library. You learn the idiomatic way of writing code in a language by reading source code written by its original authors. I am planning to learn Rust. I will also give this approach a try. There are two limiting factors when you might struggle with this approach 1) Poor documentation 2) when the standard library is implemented in a lower level language.  

Author of the post shares his views on important features he expects from a future scripting language. I agree with his list. I will add a couple more:

  • Better tooling support. Author talked about IDE but I think tools for formatting, packaging, scaffolding, new modules, etc should also be part of the language standard tooling. 
  • Simple. It should have one idiomatic way to do a task. Also, stability should be preferred over feature bloat. 
Continue reading “Useful Stuff I Read This Week”

Useful Stuff I Read This Week

Here are 8 posts I thought were worth sharing this week.

Being Nice and Effective – Link

In this post, Subbu Allamaraju shares his thoughts on how you can be both a nice and effective leader . He talks about six different leadership styles and how those leadership styles create positive and negative climates. I am in my first engineering leadership role and still figuring out my leadership style. Based on my limited leadership experience I think a leader can have multiple leadership styles depending on the situation. There are times you have to course correct and change your leadership style based on the situation and context. Also, I think leaders can be “nice” and “not nice” depending on the context. Leadership is hard. 

42 things I learned from building a production database – Link

Not a deep technical post. Many useful pieces of advice by Mahesh Balakrishnan in this post. He worked on a Chubby like system at Facebook. My favorites:

  • Be conservative on APIs and liberal with implementations
  • When designing APIs, write code for one implementation; plan actively for the second implementation; and hope/pray that things will work for a third implementation.
  • Anything that can’t be measured easily (e.g., consistency) is often forgotten; pay particular attention to attributes that are difficult to measure
  • Make your project robust to re-orgs. 
Continue reading “Useful Stuff I Read This Week”

When to use shared libraries in Microservices architecture

One of the advantages of Microservices architecture is that it enables components to have deployment independence. Based on my consulting and software development experience deployment independence is often overlooked and very few teams achieve it. Deployment independence is important since it brings true agility and reduces communication overhead between different teams and services. 

Shared libraries make Microservices tightly coupled and introduce hard dependencies. Since, now a team making a change has to ensure that it does not break another service that depends on the shared library. This requires communication between multiple teams. Also, change in a shared library leads to all the services that depend on it to be redeployed.  This leads to long build, release, and deployment times. We might have to consider the deployment order of services as well. All this leads to more synchronization and communication between teams. So, it is recommended that in Microservices architecture teams should avoid using shared libraries. 

Continue reading “When to use shared libraries in Microservices architecture”

Useful Stuff I Read This Week

Here are 10 posts I thought were worth sharing this week.

Who Is Driving the Great Resignation? – Link

It is hard to retain good talent in tech. I agree with the reasons on why employees are resigning in huge numbers. Shortage of good talent, better compensation, lack of purpose, burnout, career advancement are the reasons that I hear as well. One reason that I don’t see covered in the post is poor leadership skills. It might be implicit but I think it should be called out as well. Good leadership can provide a sense of belonging and purpose that can help retain good employees.

Continue reading “Useful Stuff I Read This Week”

Web API Design Anti-Pattern: Exposing your database model

One of the common Web API design anti-patterns that I see in the field is the exposure of database model in the API contract. If you are building a Java Spring Boot JPA application then it means exposing JPA entities as Web API’s request and response objects. The primary reason this happens is because most teams are not following contract first model of API design. They start from code and database schema and then they create API contract from them.

This is not the first time I have seen this anti-pattern being applied by development teams. I have seen this often so I thought let me document it so that in future I can share this post. The advantages of document such lessons/patterns/practices are:

  • I can be thorough in my explanation. Writing helps me understand if my point is valid. Writing is thinking for me.
  • While explaining to a developer I might forget a key point.
  • Give the development team time to reflect upon the feedback by themselves.
  • Discussion after going over the post might be more productive.
  • I can keep updating this post.

Following are the reasons that I think we should avoid exposing database model as an API contract.

Continue reading “Web API Design Anti-Pattern: Exposing your database model”

Useful Stuff I Read This Week

Here are 7 posts I thought were worth sharing this week.

Google: A Collection Of Best Practices For Production Services – Link

This is an amazing read. Building resilient systems is hard. The first step to building resilient systems is to become aware of the practices that are used in the trenches. All the practices are worth reading/knowing and you should look for opportunities to apply them in your environment . Every few weeks I see teams struggling with making configuration changes safely. Article gives some practice advice on the same. Writing fail-safe and resilient HTTP clients is not easy. HTTP clients are used heavily in Microservices architecture. Most developers consider the happy path when service either succeeds or fails with expected response codes. But, we need to consider retries with jitter, timeouts, queueing, load shedding, etc while building HTTP clients. This article covers a few more practices that can help us build resilient systems. 

Continue reading “Useful Stuff I Read This Week”