%%
date:: [[2022-10-28]], [[2022-11-03]], [[2023-11-27]], [[2023-11-30]], [[2023-12-04]], [[2024-01-16]]
%%
# [[Distributed tracing]]
Distributed tracing is the process of tracking a request or transaction as it progresses throughout different components within a system. It is a pillar of [[Observability]], along with metrics and logs. Unlike metrics and logs, both of which monitor events occurring on the side of the application components, distributed tracing takes a different approach by following the path of a request through an application.
You can think of a trace as a type of structured [[Logs|log]] that includes context, correlation, hierarchy, and other information. [^otel]
<iframe width="560" height="315" src="https://www.youtube.com/embed/zDrA7Ly3ovU?si=A_klewkaYA4KLvDx&start=353&end=950" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
[^goh]
<iframe width="560" height="315" src="https://www.youtube.com/embed/ZirbR0ZJIOs" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
## Why do distributed tracing?
Distributed tracing is driven by the stimuli introduced to an application, as it more closely examines the effect of a single transaction. It is useful for doing performance or latency optimization as well as performing root cause analysis on identified issues. [^daniel1]
Distributed tracing is useful, if not essential, when testing [[Microservices]] or [[Service-oriented architecture]]s because of the [[Disadvantages of microservices#Increased difficulty in troubleshooting|increased difficulty in troubleshooting]]. It helps us:
- Monitor service health
- Perform [[Root cause analysis]] to get to the bottom of an issue
- Improve application performance [^simme]
### What does a trace show that metrics and logs don't?
A trace shows *where* in a system the operations triggered by a user action went. In a way, it can be seen as a sort of [[Lead indicator]], whereas metrics and logs tend to be [[Lag indicator|lag indicators]]: when
### What's the difference between tracing and [[Continuous Profiling|profiling]]?
## Traces and spans
A span is the smallest unit of measure that involves an operation and the duration of that operation. A span can also be found in a typical waterfall network graph.

A trace is a [[Directed acyclic graph (DAG)]] of spans, [^Jaeger] in that it is presented as a diagram as it goes from one node to another.
 [^Jaeger]
A trace is like an E-PASS device that tags cars at certain toll points, measuring the time between two points along with metadata like the license plate and ownership details.
## The trace process
### The header
Tracing typically starts by adding a header to a request, usually in [[Protobuf]] format. The header contains a unique string that is used as the trace ID.
### The propagator
The propagator is the format that will be used by the headers.
Requests with headers are then sent in batches to a distributed tracing tool. This can be done in several formats (w3c, jaeger, b4, ot) but the most common is w3c.
### Exporter
An exporter actually sends the message(s) to the endpoint (for visualization).
## [[Distributed tracing protocols]]
## [[Observability Tools#Distributed tracing Traces|Distributed tracing tools]]
## [[Distributed tracing and performance testing]]
## See also
- [[Observability]]
## References
[^Jaeger]: [Jaeger](https://www.jaegertracing.io/docs/1.22/architecture/)
[^daniel1]: [[k6 Tech Talk 20210303]]
[^simme]: Aronsson, S. (2021). _Intro to distributed tracing with Tempo, OpenTelemetry, and Grafana Cloud._ Retrieved from https://grafana.com/blog/2021/09/23/intro-to-distributed-tracing-with-tempo-opentelemetry-and-grafana-cloud . [[Intro to Distributed Tracing With Tempo, OpenTelemetry, and Grafana Cloud|My highlights]].
[^goh]: Elliott, J., van der Hoeven, N., and Balogh, P. (2023). [[GOH 22 - How to get started with Tempo with Joe Elliot]]
Not found
This page does not exist