Kubernetes Tip: How to access Pod metadata in containers running inside the pod?

Today, I faced a requirement where a running container need to access Pod’s metadata. For my usecase, the running container need to know the namespace it belonged to. After spending time on Google, I learnt about Kubernetes Downward API that exposes Pod information to running container in the form of environment variables.

Continue reading “Kubernetes Tip: How to access Pod metadata in containers running inside the pod?”

The Kubernetes Guide: Part 1: Learn Kubernetes by deploying a real-world application on it

This is the guide I wish I had when I was starting my Kubernetes journey. Kubernetes is a complex technology with many new concepts that takes time to get your head around. In this guide, we will take an incremental approach to deploying applications on Kubernetes. We will cover what and why of Kubernetes and then we will learn how to deploy a real-world application on Kubernetes. We will first run application locally, then using Docker containers, and finally on Kubernetes. The guide will also cover Kubernetes architecture and important Kubernetes concepts like Pods, Services, Deployment.

In this guide, we will cover following topics:

  1. What is Kubernetes?
  2. The real reasons you need Kubernetes
  3. Kubernetes Architecture
  4. Deploying a real world application on Kubernetes

What is Kubernetes?

Kubernetes is a platform for managing application containers across multiple hosts. It abstracts away the underlying hardware infrastructure and acts as a distributed operating system for your cluster.

Kubernetes is a greek for Helmsman or Pilot (the person holding the ship’s steering wheels)

kubernetes-os

Kubernetes play three important roles:

  1. Referee
    • Kubernetes allocates and manages access to fixed resources using build in resource abstractions like Persistent Volume Claims, Resource Quotas, Services etc
    • Kubernetes provides an abstracted control plane for scheduling, prioritizing, and running processes.
    • Kubernetes provides a sandboxed environment so that applications do not interfere with each other.
    • Kubernetes allows users to specify the memory and CPU constraints on the application. It will ensure application remain in their limits.
    • Kubernetes provides communication mechanism so that services can talk among each other if required.
  2. Illusionist
    • Kubernetes gives the illusion of single infinite compute resource by abstracting away the hardware infrastructure.
    • Kubernetes provides the illusion that you need not care about underlying infrastructure. It can run on a bare metal, in data centre, on the public cloud, or even hybrid cloud.
    • Kubernetes gives the illusion that applications need not care about where they will be running.
  3. Glue
    • Kubernetes provides common abstractions like Services, Ingress, auto scaling, rolling deployment , volume management, etc.
    • Kubernetes comes with security primitives like Namespaces, RBAC that applications can use transparently

I learnt about the three roles – Referee, Illusionist, and Glue from the book Operating Systems Principles and Practices by Thomas Anderson and Michael Dahlin

Continue reading “The Kubernetes Guide: Part 1: Learn Kubernetes by deploying a real-world application on it”

Kubernetes Tip: How to refer one environment variable in another environment variable declaration?

In Kubernetes, one way to pass configurable data to containers is using environment variable. Below is a pod definition that uses two environment variables.

apiVersion: v1
kind: Pod
metadata:
  name: api
spec:
  containers:
    - image: com.shekhargulati/api
      name: api
      env:
        - name: DATABASE_NAME
          value: "mydb"
        - name: DATASOURCE_URL
          value: jdbc:mysql://mysql:3306/mydb      
      ports:
        - containerPort: 8080

As you can see in the above Pod definition, we are using database name mydb twice. Isn’t it will be awesome if we can use DATABASE_NAME in the DATASOURCE_URL?

Kubernetes supports this use case by providing $(VAR) syntax as shown below.

apiVersion: v1
kind: Pod
metadata:
  name: api
spec:
  containers:
    - image: com.shekhargulati/api
      name: api
      env:
        - name: DATABASE_NAME
          value: "mydb"
        - name: DATASOURCE_URL
          value: "jdbc:mysql://mysql:3306/$(DATABASE_NAME)"
      ports:
        - containerPort: 8080

Issue #6: 10 Reads, A Handcrafted Weekly Newsletter for Humans

Hey y’all,
Here are 10 reads I thought were worth sharing this week. The total time to read this newsletter is 130 minutes.
Half of everything you hear in a classroom is crap. Education is figuring out which half is which. — Larrabee’s Law
  1. The 5 Lies We Love to Tell : 10 mins read. The author makes the point that we all lie to ourselves. We should stop fooling ourselves that we don’t lie. The biggest problem with lies is that they consume a lot of your mental power in the background. You have to expend your mental energy to keep reminding yourself what lie you made to your inner self so that you don’t deviate from it.
  2. Please Stop Using Adblock (But Not Why You Think) : 10 mins read.  The key point in this post is that Adblock is making a lot of money by making advertisers like Google to whitelist their ads. I was shocked to read the dark side of AdBlock. The author recommends that people use free and open source uBlock Origin.
  3. Tech’s Two Philosophies : 15 mins read. The author makes the point that there are two main philosophies in the tech industry. The first philosophy shared by Google and Facebook is that computers should do work for humans. The second philosophy shared by Microsoft and Apple is that computers empower humans and help them do their work efficiently.
  4. It’s about time to design in the real world. Introducing Hadron! : 10 mins read. Hadron is a tool aimed to make designing through code visual, fast and easy by embracing the web platform. Even though you will use code, the great thing is that not only very little writing is needed to get started, but also your designs can be progressively enhanced. Meaning that you can start designing with only simple HTML and CSS, and later make your design do more by adding behaviour through other Web Components or even writing JS yourself.
  5. The Economics of Writing a Technical Book: 15 mins read. This is a good post that will help you understand economics of writing a technical book. In my opinion, author made good money from writing his first book. Part of it has to do with the fact that he wrote for O’Reilly Media. Writing a book is tiring and cumbersome so kudos to the author on publishing his first book.
  6. Three-day no-meeting schedule for engineers: 5 mins read. It is great to see organisations understanding the need for focussed time. I strongly believe what Paul Graham wrote in his essay on Manager vs Maker schedule. Software development is a creative endeavour that requires an undistracted, peaceful environment for good work. I hope my organization also does the same one day.
  7. High availability and scalable reads in PostgreSQL : 20 mins read. A detailed primer on streaming replication, complete with performance measurements.
  8. How Postgresql analysis helped to gain 290 times performance boost for a particular request : 10 mins read. This is an interesting read as the guy tried many difficult solutions before figuring out a simple change to improve performance of his query by 290 times. Simple solutions are difficult to find. This post shows  how query and data model design minor mistakes can lead to performance bottlenecks and how extremely useful explain analyze command can be.
  9. Experiences with running PostgreSQL on Kubernetes : 20 mins read. Kubernetes is not aware of the deployment details of Postgres. A naive deployment could lead to complete data loss. Here’s a typical scenario when that happens. You set up streaming replication and let’s say the first master is up. All the writes go there and they asynchronously replicate to the standby. Then suddenly the current master goes down but the asynchronous replication has a huge lag caused by something like a network partition. If the naive failover leader election algorithm kicks in or the administrator who doesn’t know the state manually triggers failover, the secondary becomes the master. That becomes the source of truth. All of the data during that period is lost because all of the writes that were not replicated disappear. Whenever the admin recovers the first master it’s no longer the master any more and it has to completely sync the state from the second node which is now the master.