Useful Stuff I Read This Week

Here are 9 posts I thought were worth sharing this week:

Talk: A History of Clojure – Link

Amazing talk by Rich Hickey on the History of Clojure. In this video, Rich covers how he took a two year self-funded sabbatical to build the first version of Clojure. Clojure is a dialect of Lisp, the second-oldest high-level programming language. Lisp is a functional programming language. He also covers challenges he faced to get the first customers, how he supported the community via email and IRC. I liked how he defined constraints for the languages and how he looked for practical answers to the real world questions related to performance. I have played with Common Lisp for a month or so while reading SICP. I will try Clojure soon as well.

Tips for High Availability – Link

This is a post from Netflix engineering team where they talked about how their CD practices enable high availability. Honestly, I was expecting to read more about system design stuff. Still, I enjoyed this post. They talked about things like blue/green deployment, deployment windows,canary testing, rollbacks, and few others. If you are into DevOps space for long you might already know most of them. I think the key about these practices is not knowing them but actually applying them at your scale. You really need a solid engineering culture to practice them.

How percentile approximation works – Link

This is a long read. The post talks about why percentiles(p10,p50, p95, p99) are better than averages. I have known this for many years but the detailed explanation given in this post makes it intuitive. Percentiles handle outliers much better than averages. If you have an outlier in your data average can move either too much left or right on your curve. Percentiles handle them better as they work on sorted data. Once covering the basics, this post talks about challenges in implementing percentiles and how TimescaleDB uses approximate percentiles to mitigate those challenges. This is a kind of post that you will have to read multiple times to completely understand it.

Coverage Is Not Strongly Correlated with Test Suite Effectiveness – Link

I have been saying this for many years. Code coverage has no correlation with the test quality but since it is the easiest way to enforce team write tests most teams use it. Code coverage can help you figure out under-tested parts of a program. That’s it. Beyond that it has limited relevance.

Talk: Four Distributed Systems Architectural Patterns – Link

It is an old(2017) talk that I stumble upon at frequent intervals through articles, newsletters, or podcasts. I have watched this talk earlier as well. This week I rewatched it. This talk covers four architecture patterns for distributed systems. It is a good talk and I am sure you will enjoy it as well.

N-tier architecture
Shared architecture
Lambda architecture
Streaming architecture

How Netflix Scales its API with GraphQL Federation – Link

Finally I understood GraphQL federation. GraphQL is commonly used to build the API aggregation layer(BFF or any other orchestration layer) in Microservices architecture. In the Netflix setup this API aggregation layer is managed by a central team. The problem was the lack of domain knowledge in this central team leading to poor schema health. With GraphQL federation you can shift the responsibility of GraphQL resolvers to the domain teams. They can extend the types and expand the object graph.

Why do I write tests? – Link

This is a short post on why to write tests. Author recommends that we should not skip tests while fixing bugs and providing hotfixes. I totally agree with the author. Writing tests help you understand the problem by reproducing it, avoid shooting in the dark , clear out your assumptions, provide a quick way to verify your fix, help you know when you are done, and finally give you surety that you will not introduce the same bug if you maintain your tests.

The First Rule of Machine Learning: Start without Machine Learning – Link

I am also in the same camp as the author. ML requires extensive data, resources and skills which most engineering teams lack. Most of the time a heuristic based rule system can take you far. Once you have reached the point where effort required to maintain a heuristic based rule system outweighs the effort required to build and deploy the ML system then you should go ahead with the ML system. This is the kind of practical advice we need from practitioners. Must read if your organization/product team is pushing for ML/AI systems.

Cloudflare’s Disruption – Link

I have been following Cloudflare for the last couple of years. I have not used any of their products yet. But, with the latest announcement of R2, a S3 compatible object storage product with no egress cost it is definitely a storage service that orgs can use. S3 is the backbone of most AWS services. Given that Cloudflare now has S3 equivalent they can also start building other services on top of it. I know it is still early days for Cloudflare in comparison to AWS, Azure, and GCP but they could still target becoming the fourth choice in the cloud ecosystem.

Discover more from Shekhar Gulati

Subscribe to get the latest posts sent to your email.