If you have ever worked on a distributed application you will know that it is difficult to debug when things go wrong. The two common tools to figure out root cause of the problem are logging and metrics. But the fact of the matter is that logs and metrics fail to give us complete picture of a situation in a distributed system. They fail to tell us the complete story.
If you are building a system using Microservices / Serverless architecture then you are building a distributed system.
Logs fail to give us the complete picture of a request because they are scattered across a number log files and it is difficult to link them together to form a shared context. Metrics can tell you that your service is having high response time but it will not be able to help you easily identify the root cause.
Logging and Metrics are not enough to build observable systems.
Observability is a measure of how well internal states of a system can be inferred from knowledge of its external outputs. It helps bring visibility into systems. – Wikipedia
Logs, metrics, and traces are the three pillars of observability. While most software teams use logging and monitoring few of them use traces. Before we look at distributed tracing in depth, let’s define logs, metrics, and traces.