newsletter – Page 3 – Shekhar Gulati

Issue #31: 10 Reads, A Handcrafted Weekly Newsletter For Software Developers

The time to read this newsletter is 200 minutes.

A liar will not be believed, even when he speaks the truth – Aesop

How to remove duplicate lines from files keeping the original order: 15 mins read. Finally learnt something about awk. The post explains how you can remove duplicate lines in a file while preserving their order. This deduplication on steroids. It is in my todo list to learn awk one day.
Google’s Chrome Becomes Web ‘Gatekeeper’ and Rivals Complain: 15 mins read. I have read this multiple times. Chrome is at the core of Google’s digital strategy. Google needs to track us to show ads and make money. This is the reason they are coming up with updated Chrome Extension API that will limit what ad blockers can do. In my view, the big problem is not Chrome or Google. We have ads because people want to earn money from their content. Google does not put ads magically; site owners add Google ad tracking scripts that share information with Google. Till the time, we don’t create a better financial model for content creators. This problem can’t be solved. Brave browser by Brendan Eich, co-founder of Mozilla and the current CEO of Brave Software Inc. is trying to do some work on it but it is still early days for it.
Tests that sometimes fail: 30 mins read. Author makes following valid points:
1. Flaky tests are useful at finding underlying flaws in our application. In some cases when fixing a flaky test, the fix is in the app, not in the test
2. Common patterns of flaky tests
  1. Flaky tests caused by hard coded ids because they rely on database sequences
  2. Making bad assumptions about DB ordering. Result returned by SQL query is unordered.
  3. Incorrect assumptions about time
  4. Bad assumptions about the environment
3. Mitigation patterns
  1. Run test suite in a tight loop, over and over again on a cloud server. Each time tests fail we flag them and at the end of a week of continuous running we mark flaky specs as “skipped” pending repair.
  2. One big issue with flaky tests is that quite often they are very hard to reproduce. To accelerate a repro I tend to try running a flaky test in a loop.
  3. Invest in fast test suite
  4. Add purpose built diagnostic code to debug flaky tests you can not reproduce
You need neither PWA nor AMP to make your website load fast: 10 mins read. Author writes, “why was AMP needed? Well, basically Google needed to lock content providers to be served through Google Search. But they needed a good cover story for that. And they chose to promote it as a performance solution”. I kind of agree with author that AMP hurts the web community more than it helps. I have disabled AMP in my blog.
Fast key-value stores: An idea whose time has come and gone: 30 mins read. Interesting paper by Google on building stateful services instead of stateless. I also went with stateful service architecture in my last application. It has its own challenges but in some cases it is the only viable option.
6 new ways to reduce your AWS bill with little effort: 10 mins read. This post can help you save some $$$ in your monthly AWS bill. The author suggests 6 ways we can reduce AWS bill. Out of the 6, I found following two ways worth a try:
1. Use EC2 AMD instances
2. Use VPC endpoints instead of NAT gateways
Disaster Tolerance Patterns Using AWS Serverless Services: 30 mins read. Just read it if you are using AWS.
How Far Out is AWS Fargate?: 15 mins read. This is a good post comparing AWS Fargate and AWS Lambda.
1. With Lambda you pay per invocation and the price is based on the memory you allocate for your function (up to 30GB) and its execution time. The amount of compute available to your Lambda function is based on it’s memory allocation. This pricing model is ideal for workloads that have spikes and/or long periods of downtime.
2. Fargate, on the other hand, lets you configure how many VCPUs (up to 8) and GBs of memory (up to 3GB) you want your Fargate tasks to have independently, priced by the secondrounded up to one minute.
Learning to Listen to one’s own Boredom: 15 mins read. All of us need to learn to develop a ‘late style’ – ideally as early on in our lives as possible: a way of being wherein we shake off the dead hand of habit and social fear and relearn to listen to what entertains us
How We Built a Content-Based Filtering Recommender System For Music with Python: 30 mins read. I love these kind of tutorial that help you learn by building an application in step by step manner. Give it a try and you will learn something about building a content-based recommender system for music.

Video of the week

Issue #30: 10 Reads, A Handcrafted Weekly Newsletter For Software Developers

The time to read this newsletter is 175 minutes.

Do every act of your life as though it were the very last act of your life – Marcus Aurelius

Learn more programming languages, even if you won’t use them: 10 mins read. I first got this advice few years back when I watched a two minute video by Bjarne Stroustrup, creator of C++. He recommended you should not call yourself programmer if you only know one programming language. The magic number he mentioned in the video was 5. This post also makes the same point. Different programming languages are good at different things. Every programming language makes a tradeoff. They help you think about a problem in different way. I got hang of functional programming once I learnt Scheme basics. I try to learn a new programming language every couple of years. I need to start using them in my side projects.
Announcing AMP Real URL: 20 mins read. In case you are not aware, AMP stands for Accelerated Mobile Pages. AMP is an open source standard led by Google that helps speed up access to websites by caching the content near to user. This is good for readers but for content producers there were few issues. The biggest issue with AMP is that rendered webpage has a URL starting with https://google.com/amp/. Users have become used to looking at the navigation bar in a web browser to see what web site they are visiting. The AMP cache breaks that experience. In this post by Cloudflare folks authors talk about how they fixed the real origin URL problem with AMP using web packaging and Cloudflare workers.
Infrastructure as Code, Part One: 15 mins read. This is an introductory read on infrastructure as code. If you are not aware of it then you should give it a read. It is a nicely written introduction to IaC.
When rules don’t apply: 30 mins read. This is a 30 mins video that talks about how executives at Apple, Google, eBay, Intuit, and other big tech companies conspire against their own employees by secretly agreeing among themselves not to hire each other employees. Tech companies treat their employees as their assets and cheat them.
Designing a modern serverless application with AWS Lambda and AWS Fargate: 20 mins read. A lot of good ideas in this post on how to build modern applications. The key points for me in this post are:
1. You should different compute services based on the use case. The post talks about why author used both AWS Lambda and AWS Fargate. For short computation jobs use lambda and for long compute jobs that have no designated end use AWS Fargate.
2. Give a thought on isolation model when deciding which compute service to use. AWS Lambda compute instances are isolated from each other so if one rogue your application will not suffer.
3. When you are building a side project or building MVP for your startup your goal should be to minimise maintenance and operation tasks. Serverless services help you do that.
4. AWS CDK is a service that allows you to write IaC in your own preferred language.
Thundering Herds & Promises: 10 mins read. I love this kind of posts which share how team solved a real-world technical problem. This post covers how Instagram solved the thundering herd problem with their cache using the promises. The below explains explains what thundering herd problem means
> If your cache is hit with 100 concurrent requests, then, since the cache is empty, all of them will get a cache-miss at the one moment, resulting in 100 individual requests to the backend. If the backend is unable to handle this surge of concurrent requests (ex: capacity constraints), additional problems arise. This is what’s sometimes called a thundering herd.
The Good and the Bad of Google Cloud Run: 10 mins read. The key point made in this post is that Google Cloud Run is not FaaS. Google Cloud Run allows developers to push container images with HTTP server to GCP and GCP takes care of running them at scale. If you have build pure serverless application you will know that pure serverless apps architecture is event-driven service-full architecture. This forces developers to think about applications in a different way. According to author, Cloud Run is providing a safety blanket for developers intimidated by the paradigm shift of FaaS and service-full architecture.
Azure Cosmos DB: Microsoft’s Cloud-Born Globally Distributed Database: 20 mins read.This is a detailed explanation of Azure’s Cosmos DB internals. This article was too technical and detailed for me. I will try to re-read it again to better grasp the underlying details of Cosmos DB.
How to Improve Your Memory (Even if You Can’t Find Your Car Keys): 10 mins read. This post by Adam Grant talks about how to improve your retention power. The key points are:
1. Take rest after learning a new concept.
2. Don’t re-read stuff
3. Try to do a small quiz on what you have learnt or try to explain it to someone
4. I also apply similar technique in my newsletter by summarising what I have learnt from a post in my own words.
An Overview of Go’s Tooling: 30 mins read. This is the post that you should bookmark if you are a Go developer. The post covers most the Go tools a developers need to interact with. I wish more such posts should be written for other languages as well.

Video for this week:

Issue #26: 10 Reads, A Handcrafted Weekly Newsletter For Software Developers

The time to read this newsletter is 180 minutes.

Wealth is the ability to fully experience life. — Henry David Thoreau

Don’t get clever with login forms: 10 mins read. This post points to a valid concern related to cleverness of login forms. Author through a set of examples explain why clever login forms end up confusing users. Another example of clever login experience that author does not cover is https://login.microsoftonline.com . I agree with author recommendations for login page:
1. Have a dedicated page for login
2. Expose all required fields
3. Keep all fields on one page
4. Don’t get fancy.
Why Google Needed a Graph Serving System: 30 mins read. In this post, author shares his story of building a distributed graph database that can answer queries with relationship. The post goes over various Graph based systems developed at Google and why Google failed to build a distributed Graph database that does not suffer from depth join problem. This post highlights an interesting point related to Google’s struggle to build innovative solution because of their internal politics. Building a distribued graph database that does not suffer from depth join problem is a herculean task. Dgraph an open source database developed by the author along with others in community is trying to build such a system.
You probably don’t need a single-page application: 10 mins read. I agree in entirety with the author that best solution to build web application is somewhere in middle i.e. building hybrid apps. Build SPA only for parts where you need rich interaction and keep most other pages server rendered.
Google wants Cloud Services Platform to Borg your datacenter: 20 mins read. This post gives insight into why Google made the move to build and open source Kubernetes. Google knew they are going to have a hard time beating AWS and Azure. So, they built and released Kubernetes and hoped it becomes a successful project with big community. This means cloud just became an implementation detail and most big enterprises started considering Kubernetes as a choice of softwaere to build a modern hybrid datacenter. Google’s Cloud Service Platform(CSP) will give enterprises a hardened Kubernetes, Istio, Knative software distribution. CSP is going to be a game changer for Google Cloud. Also, many OpenShift users might consider going for CSP. Interesting time ahead!
Four Techniques Serverless Platforms Use to Balance Performance and Cost: 30 mins read. This is the best article I have read on Serverless. It starts by helping reader understand architecture of Serverless platform and then it talks about elephant in the room — cold start problem associated with Serverless platforms. The article covers four techniques that is employed by different Serverless platforms to overcome cold start issue. The techniques mentioned in the post are following:
1. Function resource sharing
2. Function resource pooling
3. Function prefetching
4. Function prewarming.
Lessons from 6 software rewrite stories: 20 mins read. Another amazing read for this week. This post through real examples explain when it is fine to rewrite software. If you are building software for long, you will have come across advice by Joel Spolsky that rewriting software is the single worst strategic mistake that a software company can make. The post author tells the other side of the story in this post. The key take away from the post is
1. Once you’ve learned enough that there’s a certain distance between the current version of your product and the best version of that product you can imagine, then the right approach is not to replace your software with a new version, but to build something new next to it — without throwing away what you have.
How to build a distributed throttling system with Nginx + Lua + Redis: 15 mins read. This post covers how to build API rate limiting system with Nginx, Lua, and Redis. Instructions mentioned in the post are clear and to the point.
Monte Carlo Simulation with Python: 20 mins read. The post explains Monte Carlo simulation using a simple but realisitic example. As per wikipedia,
1. Monte Carlo methods are a broad class of computational algorithms that rely on repeated random sampling to obtain numerical results. Their essential idea is using randomness to solve problems that might be deterministic in principle. They are often used in physical and mathematical problems and are most useful when it is difficult or impossible to use other approaches. Monte Carlo methods are mainly used in three problem classes: optimization, numerical integration, and generating draws from a probability distribution.
5 Ways To Process Feedback At Work Without Triggering A Stress Response: 10 mins read. This post covers an important aspect of professional life — taking feedback. The author suggests following:
1. Keep an open mind about receiving feedback. Focus on how your work can be improved with some extra perspective.
2. Don’t respond right away, take a few seconds to really process the feedback. You can assess rationally and logically, without undue emotion.
3. Make sure you understand the feedback. In cases where you don’t, ask questions! The feedback giver should be happy to discuss specific points deeper to help clarify their suggestions.
4. Be humble and gracious! Let them know you appreciate that they gave their time and energy to help make you more successful.
5. Don’t let constructive criticism go in one ear and out the other. Take what you hear, implement it, and follow-up.
How to Organize your Monolith Before Breaking it into Services: 15 mins read. This post talks about an intermediary stage between monolithic and microservices – a monolithic organized by domain without the entanglement or fragility of our original codebase. I agree with author in entirety that we should start with monolithic and modularise applications based on sub domains by applying DDD principles. If required in future, we can easily make these subdomain functional modules to services. It is great to read post like this as they provide valuable information that is usually missing in most posts found on the web.

Issue #25: 10 Reads, A Handcrafted Weekly Newsletter For Software Developers

The time to read this newsletter is 150 minutes.

The illiterate of the 21st century will not be those who cannot read and write, but those who cannot learn, unlearn and relearn. — Alvin Toffler

The Hard Truth About Innovative Cultures: 20 mins read. This post answers a question that I was struggling to find the right answer. If there is one post that you should read this week then It should be this one. From the post:
> A tolerance for failure requires an intolerance for incompetence. A willingness to experiment requires rigorous discipline. Psychological safety requires comfort with brutal candor. Collaboration must be balanced with a individual accountability. And flatness requires strong leadership. Innovative cultures are paradoxical. Unless the tensions created by this paradox are carefully managed, attempts to create an innovative culture will fail.
When AWS Autoscale Doesn’t: 15 mins read. This post by folks at Segment share valuable lessons on AWS autoscaling. The key points for me in the post are:
1. AWS autoscaling for ECS follows the formula new_task_count = current_task_count * ( actual_metric_value / target_metric_value ). The ratio actual_metric_value/target_metric_value limit the magnitude of scale out event. To overcome this, you either have to reduce the target value leading to over scale all the time or use custom CloudWatch metric
2. The default cool down time for scale out event is 3 minutes and cooldown for scale in event is 5 minutes
Multiply your time by asking 4 questions about the stuff on your to-do list: 10 mins read. This post won’t tell you how to magically make each day 38 hours long (we’re still working on that). But by assessing our tasks in terms of their significance, we can free up more time tomorrow.
Dotfile madness: 10 mins read. I just counted my home directory has more than 30 hidden directories. The post makes a valid argument against proliferation of dot files and dot directories. The author writes:

> Avoid creating files or directories of any kind in your user’s $HOME directory in order to store your configuration or data. This practice is bizarre at best and it is time to end it. I am sorry to say that many (if not most) programs are guilty of doing this while there are significantly better places that can be used for storing per-user program data.
Life of a SQL query: 15 mins read. What happens when you run a SQL statement? We follow a Postgres query transformation by transformation as a query is processed and results are returned.
Splitting Up a Codebase into Microservices and Artifacts: 10 mins read. This is the first post that you should read if you are thinking about Microservices. I like the way this post first talked about using module boundary to split the code base. If module boundaries are not enough then you should think about Microservices. In my opinion, you should choose Microservices 1) to scale engineering organization 2) the real need for your polyglot environment depending on your business problem.
Golang Datastructures: Trees: 20 mins read. This is an awesome read even if you can’t comprehend Golang. This beautifully written post explains how to implement a simple DOM tree in Golang. It shows implementation of breadth first search and depth first search algorithms to implement find functionality. I thoroughly enjoyed this post.
Deploying Python ML Models with Flask, Docker and Kubernetes: 30 mins read. This is an extensive tutorial that shows how to deploy Python Flask applications on Kubernetes. It covers how to deploy Machine Learning (ML) models into production environments by exposing them as RESTful API Microservices hosted from within Docker containers, that are in-turn deployed to a cloud environment.
A Minimalistic Guide to Kata Containers: 5 mins read. This is a short post that I wrote about Kata Containers. Kata Containers provide the best of containers and virtual machines. Read the post to learn more.
Building a Better Profanity Detection Library with scikit-learn: 15 mins read. This post covers how you can write your own profanity filter using machine learning. The author starts by giving reasons why he didn’t use existing profanity libraries and then he goes over the steps required to create your own profanity detection library.

https://www.youtube.com/watch?v=6oPj-DW09DU

Issue #22: 10 Reads, A Handcrafted Weekly Newsletter For Humans

The time to read this newsletter is 150 minutes.

Curiosity is, in great and generous minds, the first passion and the last – Samuel Johnson

Monorepos: Please don’t!: 20 mins read. In this post, Matt Klein gives reasons to why monorepo approach does not provide benefits most often cited by monorepo proponents. His recommendation is to go with polyrepo structure. The post makes four valid arguments:
1. Organizations that use monorepo spend considerable engineering resources on building tools to work with monorepos. Most organisations don’t have such luxury.
2. Monorepo makes it difficult to open source internal projects as you have single commit history
3. Most VCS are not meant to be used for large monolithic repositories. There is some work done by Microsoft as part of its git VFS project but it has some rough edges.
4. The last interesting point that post makes is The frank reality is that, at scale, how well an organization does with code sharing, collaboration, tight coupling, etc. is a direct result of engineering culture and leadership, and has nothing to do with whether a monorepo or a polyrepo is used.
The main drawback of polyrepo approach is that it creates a culture where different teams own different parts of the code. There are few people in organization aware of the big picture. This point is beautifully put by Adam Jacob in his post – Monorepo: please do!.

My take on this is somewhere in between. For example, if you are building a web application then I like to keep all backend Microservices in one repository and front-end application in another repo. I think the best is somewhere in between the both approaches. Taking either too far does not work.
Continue reading “Issue #22: 10 Reads, A Handcrafted Weekly Newsletter For Humans”

Issue #21: 10 Reads, A Handcrafted Weekly Newsletter For Humans

The time to read this newsletter is 165 minutes.

Quote

He who learns but does not think, is lost! He who thinks but does not learn is in great danger — Confucius

Stop Learning Frameworks: 15 mins read. This post makes a great point that we should focus more on gaining deep understanding of software development fundamentals than learning new frameworks. Frameworks come and go and most of the time knowledge you gain is not portable. Focussing on core software development topics like algorithms, unit testing, design patterns, HTTP, DDD, etc will give you better return for your time. If you encounter a new technology at work then you will learn it anyhow.
Another post on similar lines that I read this week said Stack is irrelevant. Author writes,
I’ve started with almost no knowledge about the stack and been up to speed in less than 3 months or so. This is why I always tell people to pick up technology and language agnostic problem solving skills because those are the only skills transferable across stacks.
Analyzing Hacker News book suggestions in Python: 15 mins read. A quick tutorial that will teach you how to do simple text processing in Python. This tutorial covers how to find top book recommendations in an HN thread. The approach used in simple and most programmers can easily follow it.
The Major Features in Postgres 11: 20 mins read. This is 24 page PDF slidedeck covering important features introduced in Postgres 11. My favourite features from the list are:
1. Partitioning table by hash
2. Parallel hash joins
3. Finer-Grained access control
Materialized views vs. Rollup tables in Postgres: 15 mins read. I loved this post for being clear and to the point. It teaches when to use Materialized view and rollup tables. I could see myself using it in near future. Great post!
How to Get Over Productivity Guilt: 10 mins read. Just the kind of post that you need to end your year. I feel productivity guilt on many days. I am trying to be better at it. Else, it kills all your happiness and you end up doing nothing. Prioritize the essentials tasks that you need to do and get them done. Life will be good. Don’t over stress yourself.
The business case for serverless: 10 mins read. I spent few months on building an application using Serverless. I uses AWS stack to build the application.I enjoyed Serverless experience but I found that pace of development goes down as you are not able to do end-to-end testing on your local machine. This post does a good job in making a business case for Serverless. The main point author makes is that Serverless increases developer velocity. Author concludes the post by writing
One day, complexity will grow past a breaking point and development velocity will begin to decline irreversibly, and so the ultimate job of the founder is to push that day off as long as humanly possible. The best way to do that is to keep your ball of mud to the minimum possible size— serverless is the most powerful tool ever developed to do exactly that.
Another interesting post on Serverless that I read this week is by Amazon ‘s Tim Bray — Serverless Everything. He proposed an interesting way to look at Serverless using data plane and control plane analogy. Give it a read, it is not that long.
The Amazon MQ service, which is a managed version of the excellent Apache ActiveMQ open-source message broker. To make this usable by AWS customers, we had to write a bunch of software to create, deploy, configure, start, stop, and delete message brokers. In this sort of scenario, ActiveMQ itself is called the “data plane” and the management software we wrote is called the “control plane”. The control plane’s APIs are RESTful and, in Amazon MQ, its implementation is entirely serverless, based on Lambda, API Gateway, and DynamoDB.
Benchmark PostgreSQL With Linux HugePages: 15 mins read. PostgreSQL is my first choice when it comes to RDBMS. This post covers how we can configure Linux HugePages configuration to improve performance of Postgres database. Author concludes the post by writing
One of my key recommendations is that we must keep Transparent HugePages off. You will see the biggest performance gains when the database fits into the shared buffer with HugePages enabled. Deciding on the size of huge page to use requires a bit of trial and error, but this can potentially lead to a significant TPS gain where the database size is large but remains small enough to fit in the shared buffer.
How we built Globoplay’s API Gateway using GraphQL: 15 mins read. This blog start with the reason why Globo’s team decided to use GraphQL instead of REST for building API. The main reason outlined in the post for choosing GraphQL is the ease with which you can support different requirements for different devices. After covering the why GraphQL, the post talks about how you can get started with it.
In mid 2018, we had two backends for frontends (BFF) doing very similar tasks: One for web, and another for iOS, android and TV. As much as I love the “backend for frontend” idea (and how cool it sounds), we could not keep the current architecture. Not only because of the reasons I just said, but because each BFF was serving slightly different content to its clients while the business team started to ask for something new: Ubiquity among all clients. the more I reviewed everything we needed to support, the more GraphQL started to make sense. While TVs need a big program poster, mobiles need a small one. We need to show exactly the same video duration among all clients. TVs should provide detailed information about each program, but iOS and android could show only a poster + program title.
Envoy Proxy at Reddit: 20 mins read. The post goes into depth on why Reddit moved to Envoy for service to service communication. What I like in this post is how they incorporated a new technology in a step by step manner rather than going the Big Bang approach. It is a well written post so you will end up learning a lot about how big sites like Reddit introduce new technologies in their ecosystem and how they architecture evolve over time. The post outlines three requirements Reddit team had from their service mesh choice. Envoy and its ecosystem fit all of these requirements.
1. Performance: Avoid adding a performance bottleneck at all costs. Any performance losses at the proxy level need to be offset by considerable feature gains. The two biggest considerations here were resource utilization and latency impact. Our mesh approach accounts for a sidecar proxy on every host, so we wanted the solution to be one that we were comfortable running on every host and at every hop in the network.
2. Features: The biggest differentiator among the options was the possibility of L7 Thrift support in the proxy. Thrift is our main inter-service RPC protocol and without first-class support for the behavior control we want in a service mesh, it wouldn’t make sense to switch to something that would just be providing the same basic TCP load balancing we’re getting out of HAProxy. We’ll address this in the next section.
3. Integrations and Extensibility: Being able to contribute or request integrations and possibly extend out-of-the-box functionality was also a core requirement. The network proxy needed to be able to evolve with Reddit’s service needs and developer feature requests.
Going Head-to-Head: Scylla vs Amazon DynamoDB: 30 mins read. This post compares ScyllaDB with Amazon DynamoDB. Author writes, Scylla is a drop-in replacement for Cassandra, implemented from scratch in C++. Cassandra itself was a reimplementation of concepts from the Dynamo paper. So, in a way, Scylla is the “granddaughter” of Dynamo. That means this is a family fight, where a younger generation rises to challenge an older one. It was inevitable for us to compare ourselves against our “grandfather,” and perfectly in keeping with the traditions of Greek mythology behind our name.
The conclusion from the post says it all
1. DynamoDB failed to achieve the required SLA multiple times, especially during the population phase.
2. DynamoDB has 3x-4x the latency of Scylla, even under ideal conditions
3. DynamoDB is 7x more expensive than Scylla
4. Dynamo was extremely inefficient in a real-life Zipfian distribution. You’d have to buy 3x your capacity, making it 20x more expensive than Scylla
5. Scylla demonstrated up to 20x better throughput in the hot-partition test with better latency numbers
6. Last but not least, Scylla provides you freedom of choice with no cloud vendor lock-in (as Scylla can be run on various cloud vendors, or even on-premises).

I will end this newsletter with a short video — The Electronic Coach. This short video shows how Donald Knuth build a mainframe program that helped a basketball team win 11 out of 14 matches. This is an early example of computer used in data driven decision making. In case you don’t know Donald Knuth, he is one of the greatest computer scientist. His book The Art of Computer Programming is included by American Scientist on its list of books that shaped the last century of science.