The 5 Minute Introduction to DuckDB: The SQLite for Analytics

Updated: 3rd September 2020

A couple of weeks back I learnt about DuckDB while going over DB Weekly newsletter. It immediately caught my attention as I was able to quickly understand why need for such a database exist. Most developers are used to working with an embedded file based relational database in their local development environment. Most popular choice among embeddable RDBMS is SQLite. Developers use embeddable databases because there is no set up required and they can get started quickly in a couple of minutes. This enables quick prototyping and developers can quickly iterate on business features.

DuckDB is similar to SQLite in the sense it is also designed to be used as an embeddable database. Developers can easily include it as a library in their code and start using it. Later in this post, I will cover how we can use DuckDB with Python.

Continue reading “The 5 Minute Introduction to DuckDB: The SQLite for Analytics”

The 5 minute introduction to Log-Based Change Data Capture with Debezium

Few years back I was working on an application where I had to pull data from an event table(populated using database triggers) and update my application specific data stores . This is a common problem that most software web developers need to solve. At that time I was not aware that this problem has a name. Sometime later I learnt that this is called Change Data Capture (CDC). As per wikipedia article on change data capture,

In databases, change data capture (CDC) is a set of software design patterns used to determine (and track) the data that has changed so that action can be taken using the changed data.

The key benefit of CDC is that you can identify the changed data in your source database which you can then incrementally apply to your target system. In absence of CDC, we are left with doing bulk loading of the data which is both time consuming and costly.

Continue reading “The 5 minute introduction to Log-Based Change Data Capture with Debezium”

The 5 minute introduction to Osquery

Osquery is a an awesome host instrumentation framework from Facebook. It can instrument Mac, Linux, and Windows servers. It organises system data in tables that you can query using your favourite query language – SQL. It is SQL for your infrastructure. You can query for system intruders, system information, compliance, installed apps, running processes, and many more data points.

Osquery uses SQLite syntax for SQL. So, if you need more information about SQL syntax outside of what is covered in osquery documentation then you should give SQLite documentation a read.

Continue reading “The 5 minute introduction to Osquery”

A minimalistic guide to distributed tracing with OpenTracing and Jaeger

If you have ever worked on a distributed application you will know that it is difficult to debug when things go wrong. The two common tools to figure out root cause of the problem are logging and metrics. But the fact of the matter is that logs and metrics fail to give us complete picture of a situation in a distributed system. They fail to tell us the complete story.

If you are building a system using Microservices / Serverless architecture then you are building a distributed system.

Logs fail to give us the complete picture of a request because they are scattered across a number log files and it is difficult to link them together to form a shared context. Metrics can tell you that your service is having high response time but it will not be able to help you easily identify the root cause.

Logging and Metrics are not enough to build observable systems.

Observability is a measure of how well internal states of a system can be inferred from knowledge of its external outputs. It helps bring visibility into systems. – Wikipedia

Logs, metrics, and traces are the three pillars of observability. While most software teams use logging and monitoring few of them use traces. Before we look at distributed tracing in depth, let’s define logs, metrics, and traces.

Continue reading “A minimalistic guide to distributed tracing with OpenTracing and Jaeger”