Issue #31: 10 Reads, A Handcrafted Weekly Newsletter For Software Developers

The time to read this newsletter is 200 minutes.

A liar will not be believed, even when he speaks the truth – Aesop

  1. How to remove duplicate lines from files keeping the original order: 15 mins read. Finally learnt something about awk. The post explains how you can remove duplicate lines in a file while preserving their order. This deduplication on steroids. It is in my todo list to learn awk one day.
  2. Google’s Chrome Becomes Web ‘Gatekeeper’ and Rivals Complain: 15 mins read. I have read this multiple times. Chrome is at the core of Google’s digital strategy. Google needs to track us to show ads and make money. This is the reason they are coming up with updated Chrome Extension API that will limit what ad blockers can do. In my view, the big problem is not Chrome or Google. We have ads because people want to earn money from their content. Google does not put ads magically; site owners add Google ad tracking scripts that share information with Google. Till the time, we don’t create a better financial model for content creators. This problem can’t be solved. Brave browser by Brendan Eich, co-founder of Mozilla and the current CEO of Brave Software Inc. is trying to do some work on it but it is still early days for it.
  3. Tests that sometimes fail: 30 mins read. Author makes following valid points:
    1. Flaky tests are useful at finding underlying flaws in our application. In some cases when fixing a flaky test, the fix is in the app, not in the test
    2. Common patterns of flaky tests
      1. Flaky tests caused by hard coded ids because they rely on database sequences
      2. Making bad assumptions about DB ordering. Result returned by SQL query is unordered.
      3. Incorrect assumptions about time
      4. Bad assumptions about the environment
    3. Mitigation patterns
      1. Run test suite in a tight loop, over and over again on a cloud server. Each time tests fail we flag them and at the end of a week of continuous running we mark flaky specs as “skipped” pending repair.
      2. One big issue with flaky tests is that quite often they are very hard to reproduce. To accelerate a repro I tend to try running a flaky test in a loop.
      3. Invest in fast test suite
      4. Add purpose built diagnostic code to debug flaky tests you can not reproduce
  4. You need neither PWA nor AMP to make your website load fast: 10 mins read. Author writes, “why was AMP needed? Well, basically Google needed to lock content providers to be served through Google Search. But they needed a good cover story for that. And they chose to promote it as a performance solution”. I kind of agree with author that AMP hurts the web community more than it helps. I have disabled AMP in my blog.
  5. Fast key-value stores: An idea whose time has come and gone: 30 mins read. Interesting paper by Google on building stateful services instead of stateless. I also went with stateful service architecture in my last application. It has its own challenges but in some cases it is the only viable option.
  6. 6 new ways to reduce your AWS bill with little effort: 10 mins read. This post can help you save some $$$ in your monthly AWS bill. The author suggests 6 ways we can reduce AWS bill. Out of the 6, I found following two ways worth a try:
    1. Use EC2 AMD instances
    2. Use VPC endpoints instead of NAT gateways
  7. Disaster Tolerance Patterns Using AWS Serverless Services: 30 mins read. Just read it if you are using AWS.
  8. How Far Out is AWS Fargate?: 15 mins read. This is a good post comparing AWS Fargate and AWS Lambda.
    1. With Lambda you pay per invocation and the price is based on the memory you allocate for your function (up to 30GB) and its execution time. The amount of compute available to your Lambda function is based on it’s memory allocation. This pricing model is ideal for workloads that have spikes and/or long periods of downtime.
    2. Fargate, on the other hand, lets you configure how many VCPUs (up to 8) and GBs of memory (up to 3GB) you want your Fargate tasks to have independently, priced by the secondrounded up to one minute.
  9. Learning to Listen to one’s own Boredom: 15 mins read. All of us need to learn to develop a ‘late style’ – ideally as early on in our lives as possible: a way of being wherein we shake off the dead hand of habit and social fear and relearn to listen to what entertains us
  10. How We Built a Content-Based Filtering Recommender System For Music with Python: 30 mins read. I love these kind of tutorial that help you learn by building an application in step by step manner. Give it a try and you will learn something about building a content-based recommender system for music.

Video of the week

Tools Software Engineers Should Know: MTR: A network diagnostic tool

We software developers are good at debugging code related issues but when it comes to issues that require fighting with infrastructure or network then we find ourselves in a difficult position. We can solve these issues if we know the right tool to use. I faced a similar position this week. I am starting a new series where every week or two I will write about a new tool that can help us debug these kind of issues.

This week I was debugging an issue where few requests to the destination server were timing out. These types of issues typically fall under networking errors and require you to use a network diagnostic tool. Most developers start diagnosing the issue using ping and traceroute tool. Both these tools are useful but you have to run them both together to debug the issue. Recently, I discovered MTR which combines ping and traceroute tools in a single tool. I found that most developers that I work with are not unaware of this tool so I decided to document it for future me and others.

mtr stands for My traceRoute. It is useful when you need to figure out number of hops to the destination server or latency at each hop. It also help you see packet loss at each hop so that you can narrow down the place where you might be facing issue. MTR collects information regarding the state, connection, and latency of the intermediate hosts. Thus giving a complete overview of the connection between two hosts on the network.

Continue reading

Building Copy as Plain Text Google Chrome Extension in 15 mins

Today, I wanted to make it easy for me to copy text from the web in plain text format. I read a lot of stuff on the web. When I find good articles I take notes and store them in Evernote. Evernote provides a rich text editor to compose notes. The default behaviour provided by Chrome is to copy the text along with the page style. This is not what I want most of the time. To avoid this, I have to first copy the text in a plain editor like Notepad or browser address bar and then copy it again and paste in Evernote. I am doing this for long time. I thought there has to be a better way. It came to my mind that this can be easily solved by writing a Google Chrome Extension. A Google Chrome extension that adds copy as plain text context menu option.

Continue reading

Mental Models for Software Engineers: Occam’s Razor

Occam’s Razor helps us choose between two or more explanations of a problem. It provides a useful mental model for problem-solving. A razor is a principle or rule of thumb that allows one to eliminate unlike explanations for a phenomenon, or avoid unnecessary actions.

One popular definition of Occam’s Razor is:

If we face two possible explanations which make the same predictions, the one based on the least number of unproven assumptions is preferable, until more evidence comes along.

This mental model help us look for explanations that are least complicated. It does not mean explanation has to be easy so that anyone can grasp it with limited effort but it means explanation can be logically reasoned without making too many assumptions.

Continue reading

Building an Article Extraction Python API with newspaper3k and flask

Today, I was working on an application that required me to extract the main content html for a web page. This is called article extraction. Most of the time you want to extract the text of the article but I wanted to extract HTML of the main content. For example, if you are reading following WashingtonPost article then I want to extract the main HTML content on the left. I don’t want sidebar HTML containing ads or other information.

Continue reading

Issue #30: 10 Reads, A Handcrafted Weekly Newsletter For Software Developers

The time to read this newsletter is 175 minutes.

Do every act of your life as though it were the very last act of your life – Marcus Aurelius

  1. Learn more programming languages, even if you won’t use them: 10 mins read. I first got this advice few years back when I watched a two minute video by Bjarne Stroustrup, creator of C++. He recommended you should not call yourself programmer if you only know one programming language. The magic number he mentioned in the video was 5. This post also makes the same point. Different programming languages are good at different things. Every programming language makes a tradeoff. They help you think about a problem in different way. I got hang of functional programming once I learnt Scheme basics. I try to learn a new programming language every couple of years. I need to start using them in my side projects.

  2. Announcing AMP Real URL: 20 mins read. In case you are not aware, AMP stands for Accelerated Mobile Pages. AMP is an open source standard led by Google that helps speed up access to websites by caching the content near to user. This is good for readers but for content producers there were few issues. The biggest issue with AMP is that rendered webpage has a URL starting with Users have become used to looking at the navigation bar in a web browser to see what web site they are visiting. The AMP cache breaks that experience. In this post by Cloudflare folks authors talk about how they fixed the real origin URL problem with AMP using web packaging and Cloudflare workers.

  3. Infrastructure as Code, Part One: 15 mins read. This is an introductory read on infrastructure as code. If you are not aware of it then you should give it a read. It is a nicely written introduction to IaC.

  4. When rules don’t apply: 30 mins read. This is a 30 mins video that talks about how executives at Apple, Google, eBay, Intuit, and other big tech companies conspire against their own employees by secretly agreeing among themselves not to hire each other employees. Tech companies treat their employees as their assets and cheat them.

  5. Designing a modern serverless application with AWS Lambda and AWS Fargate: 20 mins read. A lot of good ideas in this post on how to build modern applications. The key points for me in this post are:

    1. You should different compute services based on the use case. The post talks about why author used both AWS Lambda and AWS Fargate. For short computation jobs use lambda and for long compute jobs that have no designated end use AWS Fargate.
    2. Give a thought on isolation model when deciding which compute service to use. AWS Lambda compute instances are isolated from each other so if one rogue your application will not suffer.
    3. When you are building a side project or building MVP for your startup your goal should be to minimise maintenance and operation tasks. Serverless services help you do that.
    4. AWS CDK is a service that allows you to write IaC in your own preferred language.
  6. Thundering Herds & Promises: 10 mins read. I love this kind of posts which share how team solved a real-world technical problem. This post covers how Instagram solved the thundering herd problem with their cache using the promises. The below explains explains what thundering herd problem means

    > If your cache is hit with 100 concurrent requests, then, since the cache is empty, all of them will get a cache-miss at the one moment, resulting in 100 individual requests to the backend. If the backend is unable to handle this surge of concurrent requests (ex: capacity constraints), additional problems arise. This is what’s sometimes called a thundering herd.

  7. The Good and the Bad of Google Cloud Run: 10 mins read. The key point made in this post is that Google Cloud Run is not FaaS. Google Cloud Run allows developers to push container images with HTTP server to GCP and GCP takes care of running them at scale. If you have build pure serverless application you will know that pure serverless apps architecture is event-driven service-full architecture. This forces developers to think about applications in a different way. According to author, Cloud Run is providing a safety blanket for developers intimidated by the paradigm shift of FaaS and service-full architecture.

  8. Azure Cosmos DB: Microsoft’s Cloud-Born Globally Distributed Database: 20 mins read.This is a detailed explanation of Azure’s Cosmos DB internals. This article was too technical and detailed for me. I will try to re-read it again to better grasp the underlying details of Cosmos DB.

  9. How to Improve Your Memory (Even if You Can’t Find Your Car Keys): 10 mins read. This post by Adam Grant talks about how to improve your retention power. The key points are:

    1. Take rest after learning a new concept.
    2. Don’t re-read stuff
    3. Try to do a small quiz on what you have learnt or try to explain it to someone
    4. I also apply similar technique in my newsletter by summarising what I have learnt from a post in my own words.
  10. An Overview of Go’s Tooling: 30 mins read. This is the post that you should bookmark if you are a Go developer. The post covers most the Go tools a developers need to interact with. I wish more such posts should be written for other languages as well.

Video for this week:

Mental Models for Software Engineers: Hanlon’s Razor

It is difficult to be sanguine when people around you does something wrong with you. The wrong doing does not have to be extreme it could be as simple as a colleague not replying to your email when you expected them to reply. In many situations like these we tend to assume that other person does not want our good and they are doing it intentionally. The result of these explanations is that we strain our relations with people around us. We can’t read their mind but still we end up assuming what is in their mind. Hanlon’s razor mental model can help us overcome this bias.

Continue reading