Choosing Primary Key Type in Postgres

In relational database design one of the key decisions is choosing the right primary key type for tables. In this post I am talking about surrogate or synthetic primary keys. They are called surrogate or synthetic as these keys are not derived from application data. In my experience I have seen very few teams giving a proper thought process to deciding primary key types. For each table they go with the default type that is used in their organization. This would mean all tables will either have a int/bigint type or uuid or varchar. In this post I am not giving any recommendation I am only discussing how different keys affect insertion speed, data size, and how they compare against each other. You will have to do your own analysis to choose the right data type based on your use case. Different tables have different needs so you should make judgement accordingly. There is no one size fit solution.

In this post, I am using Postgres 13.3. You can install it using your operating system package manager. I am running it in a Docker container.

To follow along you can create a sample database and run SQL queries in that.

create database choosing_pk_type;

Connect to this database

\c choosing_pk_type;

Primary Key Types Comparison Summary

The numbers shown above are derived from the below table.

Now the TLDR version.

Continue reading “Choosing Primary Key Type in Postgres”

Correctly using Postgres as queue

I am building a central notification dispatch system that is responsible for sending different kinds of notifications to the end customer. It relies on multiple third party APIs for sending the actual email/SMS notifications. At a high level architecture of the system is shown below.

NotificationSender exposes both REST and messaging interface for accepting consumer requests. Consumers here refer to the services that need to send the notification. This is what notification system does:

  • It accepts requests from upstream services and stores that in the Postgres database after doing validation. The notification event is written to the Postgres database in ENQUEUED state. It is returns back HTTP 202 ACCEPTED to the upstream services if the request is valid else it returns HTTP 400 Bad Request.
  • At a predefined frequency a poller that is part of the NotificationDispacther polls the Postgres database for new notification events i.e. events in ENQUEUED state. For now, it respects insertion time order.
  • If enqueued events are found then it processes them and sends actual notifications using the downstream SMS and Email services.
  • After processing the events it change state of the events to processed
Continue reading “Correctly using Postgres as queue”

Issue #6: 10 Reads, A Handcrafted Weekly Newsletter for Humans

Hey y’all,
Here are 10 reads I thought were worth sharing this week. The total time to read this newsletter is 130 minutes.
Half of everything you hear in a classroom is crap. Education is figuring out which half is which. — Larrabee’s Law
  1. The 5 Lies We Love to Tell : 10 mins read. The author makes the point that we all lie to ourselves. We should stop fooling ourselves that we don’t lie. The biggest problem with lies is that they consume a lot of your mental power in the background. You have to expend your mental energy to keep reminding yourself what lie you made to your inner self so that you don’t deviate from it.
  2. Please Stop Using Adblock (But Not Why You Think) : 10 mins read.  The key point in this post is that Adblock is making a lot of money by making advertisers like Google to whitelist their ads. I was shocked to read the dark side of AdBlock. The author recommends that people use free and open source uBlock Origin.
  3. Tech’s Two Philosophies : 15 mins read. The author makes the point that there are two main philosophies in the tech industry. The first philosophy shared by Google and Facebook is that computers should do work for humans. The second philosophy shared by Microsoft and Apple is that computers empower humans and help them do their work efficiently.
  4. It’s about time to design in the real world. Introducing Hadron! : 10 mins read. Hadron is a tool aimed to make designing through code visual, fast and easy by embracing the web platform. Even though you will use code, the great thing is that not only very little writing is needed to get started, but also your designs can be progressively enhanced. Meaning that you can start designing with only simple HTML and CSS, and later make your design do more by adding behaviour through other Web Components or even writing JS yourself.
  5. The Economics of Writing a Technical Book: 15 mins read. This is a good post that will help you understand economics of writing a technical book. In my opinion, author made good money from writing his first book. Part of it has to do with the fact that he wrote for O’Reilly Media. Writing a book is tiring and cumbersome so kudos to the author on publishing his first book.
  6. Three-day no-meeting schedule for engineers: 5 mins read. It is great to see organisations understanding the need for focussed time. I strongly believe what Paul Graham wrote in his essay on Manager vs Maker schedule. Software development is a creative endeavour that requires an undistracted, peaceful environment for good work. I hope my organization also does the same one day.
  7. High availability and scalable reads in PostgreSQL : 20 mins read. A detailed primer on streaming replication, complete with performance measurements.
  8. How Postgresql analysis helped to gain 290 times performance boost for a particular request : 10 mins read. This is an interesting read as the guy tried many difficult solutions before figuring out a simple change to improve performance of his query by 290 times. Simple solutions are difficult to find. This post shows  how query and data model design minor mistakes can lead to performance bottlenecks and how extremely useful explain analyze command can be.
  9. Experiences with running PostgreSQL on Kubernetes : 20 mins read. Kubernetes is not aware of the deployment details of Postgres. A naive deployment could lead to complete data loss. Here’s a typical scenario when that happens. You set up streaming replication and let’s say the first master is up. All the writes go there and they asynchronously replicate to the standby. Then suddenly the current master goes down but the asynchronous replication has a huge lag caused by something like a network partition. If the naive failover leader election algorithm kicks in or the administrator who doesn’t know the state manually triggers failover, the secondary becomes the master. That becomes the source of truth. All of the data during that period is lost because all of the writes that were not replicated disappear. Whenever the admin recovers the first master it’s no longer the master any more and it has to completely sync the state from the second node which is now the master.

TIL #1 | PostgreSQL Gzip Dump, Docker-Compose, Change owner of PostgreSQL tables, and finding non-ascii characters in IntelliJ

Each day I learn something new while doing software development. From today, I have started documenting them. I will be posting/sharing tips and tricks that I discover.

Continue reading “TIL #1 | PostgreSQL Gzip Dump, Docker-Compose, Change owner of PostgreSQL tables, and finding non-ascii characters in IntelliJ”

Accessing PostgreSQL Server From Remote Machine

Today I was trying to remotely connect to PostgreSQL server and it took me some time to configure it. So, in this short blog I am sharing the procedure to make it work.

  1. Edit the pg_hba.conf file to enable client connections as shown below. The important thing is the use of 0.0.0.0/0 to allow client access from any IP address. You can also define a subset of ip here.
    # TYPE  DATABASE        USER            ADDRESS                 METHOD
    
    # "local" is for Unix domain socket connections only
    local   all             all                                     trust
    # IPv4 local connections:
    host    all             all             127.0.0.1/32            trust
    host    all             all             0.0.0.0/0                trust
    # IPv6 local connections:
    host    all             all             ::1/128                 trust
    # Allow replication connections from localhost, by a user with the
    # replication privilege.
    #local   replication     postgres                                md5
    host    replication     postgres        0.0.0.0/0               trust
    host    replication     postgres        ::1/128                 trust
    
  2. Edit the postgresql.conf file and change listen_addresses = ‘*’ . This means any client can access the server. By default value is local. You can also specify a comma separated list of ip address also.
  3. Restart the postgreSQL server either using postgres service or first killing the process and then starting using pg_ctl script.
  4. The last and most important thing is to edit the /etc/sysconfig/iptables file and add entry to allow access to 5432 port. This can be done as shown below. Please add the line shown below in iptables file.
    -A INPUT -m state –state NEW -m tcp -p tcp –dport 5432 -j ACCEPT
  5. Then restart the iptables service using service iptables restart
  6. Finally test it with psql client by executing following command ./psql -U postgres -h 192.168.0.10 -p 5432