GOH 25 - Introduction to eBPF with Grafana Beyla, with Nikola Grcevski

# [[GOH 25 - Introduction to eBPF with Grafana Beyla, with Nikola Grcevski]] [in developer-advocacy](obsidian://open?vault=developer-advocacy&file=projects%2FGrafana%20Office%20Hours%2FGOH%2025%20-%20Introduction%20to%20eBPF%20with%20Grafana%20Beyla%2C%20with%20Nikola%20Grcevski) <iframe width="560" height="315" src="https://www.youtube.com/embed/ZEUzucqXUnQ" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe> Related:: "[[Paul Balogh]], [[Nikola Grčevski]]" ![[goh25-introduction-to-ebpf.png]] ## Talking points - Introduce Nikola - Who are you? - What do you do? - How long have you worked at Grafana/on Beyla? - What did you do previously? - compiler engineer, worked on Java SDK - Elasticsearch DB - The problems with instrumentation - Sometimes when you see a problem, it's too late - the issue has occurred and you didn't have it already instrumented. - Types of instrumentation - Source instrumentation (instrumenting/modifying application code) - Binary instrumentation (instrumenting/modifying application binary instead of code) - External instrumentation - including eBPF - doesn't quite fall under source or binary but also sort of both. - In [[Java]], the virtual machine lets you extend it with additional agents that has enough privileges that let it observe the VM. This isn't source or binary. - eBPF: approach differs depending on the progamming language - For Java, we can tap into Linux kernel and monitor signals, then stitch them together. - In other cases, in the case of Go, we attach to the binary (without modifying it), like a debugger, and almost uses breakpoints - Issues - multiple agents collecting different signals - agents don't exist for every language - proliferation of exporters - The dream: automagically observe everything with little to no effort - What is eBPF? - Only works on Linux kernel (eBPF on Windows is being worked on but it's not operational yet) - What is auto-instrumentation? How close is it to the automagical dream? - eBPF used to be just bPF (Berkley Packet Filter) - you can extend the kernel with additional logic (modules). Making the kernel programmable. - eBPF (with the e) can also extend the kernel but it's done completely differently now. We have a virtual machine on the kernel (just like the [[JVM]]) that can execute instructions. - VMs are better than modules because they're safer, doesn't break the kernel, and it's isolated. - eBPF can be used for other purposes: security, load balancing ([[Circuit breaker pattern]]), monitoring (monitor socket events and build observability solutions) - [[Service mesh]] can also do load balancing in the same way, just at a higher level - In eBPF there are usually two sides that you need: - kernel side: written in [[C]] - user space side: written in [[Go]] (or other language) - What is [[Grafana Beyla]]? - Initial goal: make a tool that captures signals for application monitoring, not infrastructure monitoring - [[HTTP]] and [[gRPC]] - Demo of how to set it up on Kubernetes - sidecar container - In the config, give it permissions. Then tell Beyla what port to use and listen to. - What languages does it support? - They have [[Distributed tracing]] working on Go now but not yet released. - Go, Python, Ruby, R, Elixir, pretty much any language now at the protocol level - Some nice features - Beyla automatically reduces the cardinality of URL paths by collapsing paths so that when you send this to Prometheus, you don't get charged more because of a cardinality explosion. - Beyla tells you not just the response time but also the time it took *before* the goroutine picks up and actions a request. This can only be seen at the kernel level - What is Application 011y? (demo) - Can Beyla only be run on Grafana Cloud? - Is it possible to do distributed tracing with Beyla/eBPF? - It IS, but it's not been released yet. They have an implementation for Go and HTTP. They'll likely have to take different approaches for different protocols. - They plug into the Go runtime by attaching probes to `newproc1`. Trace IDs are generated when an incoming request is received, and tracked through the Go runtime as Goroutines start the flow within the Go application. They can tie when a incoming and outgoing requests are made. - When outgoing request writes HTTP headers, they access the Go memory and rewrites the traceID just as you would with manual instrumentation. This happens at a specific time at a specific place in the Go libraries where they know it's safe to do so. This can be done manually, but Beyla does it for you. - One specific scenario where this doesn't work: SecureBoot on Linux - virtual machines on Kubernetes, because they're not allowed to write memory with eBPF. - In the process of doing this for gRPC. It's not yet ideal. - Future plans: do the same thing or something similar for other languages. Some languages will be simpler than others, because languages don't have the same way of managing internal threads. - For those that are single-threaded in nature, it will be easier: [[Ruby]], [[Python]], [[NodeJS]] - For [[Java]], it's almost impossible. Especially for some sort of reactive framework - it's impossible to track - Partnership with [[Isovalent]] (now [[Cisco]]) - What is [[Cilium]] and how does it compare to Beyla? - Cilium captures things at the network level. There are some limitations to that - it's a different approach. We believe that instrumenting at the application level rather than the network level can give us richer information. For example, for GO, we can get information about the runtime. It's easier to track things at the protocol level, like SQL calls in Go, at the application level. - At some level, some of the information can also be extracted by both. But we think Beyla can add extra value - Cilium libraries are also used in Beyla - Beyla is built with Cilium Go - What protocols does Beyla do that Cilium can't? - HTTPS implementation in Beyla doesn't require certificates unlike Cilium - How does the Cisco acquisition affect us, if at all? - What is a service mesh? How is Beyla different from a service mesh like Istio or Linkerd? - This is similar to Cilium - they can extract some info about Layer 7 HTTP events. But how do you do gRPC? How do you do Layer 2? That's going to be more chalelnging. - Service mesh is on network level - Beyla can get more information than a service mesh - Differences in protocols supported? - Goal for Beyla: Replicate the manual instrumentation in the [[OpenTelemetry|OTel]] demo, but automatically. - Outro - If people want to learn more about this topic, where should they go? - ## Timestamps 00:00:00 Introductions 00:03:17 Why instrumenting for observability can be difficult 00:12:48 How does eBPF auto-instrumentation actually work? 00:16:52 Use cases for eBPF 00:24:29 What is Grafana Beyla? 00:31:26 Demo: Setting up Beyla on Kubernetes 00:44:34 Demo: Application Observability on Grafana Cloud 00:46:06 Distributed tracing with Beyla 00:52:41 Can you use Beyla without Grafana Cloud? 00:53:43 Alternatives for Beyla, and differences 00:58:54 Ultimate goal for Beyla