Speaker 1 (00:00:02):
Hi everyone. Welcome back to the Grafana OpenTelemetry community call. We missed Liudmila so much that we had to replace her with not just one other person but a whole cast of characters here for you today. I'm Nicole van der Hoeven. I'm a developer advocate at Grafana and I'm here to stop Ted from taking over the show by just dancing. I'm joined here by my colleague across the world. Yeah.
Speaker 2 (00:00:35):
I'm Tiffany Jernigan, also a developer advocate and there are two Tiffany's, which is still mind blowing to me that at the company, I think there's only two of us, maybe there's three at most, but we're in the same area in terms of work stuff. So it's like Tiffany and it's like, wait, which one? So this is going to be fun. But yeah, been a bit fun hanging out with Nicole on these calls and definitely missing Liudmila. Yeah, Imma.
Speaker 3 (00:01:04):
Yeah, I'm Imma. I'm also a developer advocate for Grafana, also missing Liudmila a lot, so I decided to join. I'm based out of Barcelona, so also other side of the world from Tiffany. Very happy to be here with our guests who are the stars of the show for today. If you want introduce yourself, Tiffany, go ahead.
Speaker 4 (00:01:27):
Thanks. I'm Tiffany. I'm a technical writer at Grafana Labs. I am also a maintainer in the OpenTelemetry project. I work on the communications SIG, which means I handle the website, the registry, the blog, and of course the documentation.
Speaker 1 (00:01:48):
Ted, it's you.
Speaker 5 (00:01:49):
Cool.
Speaker 3 (00:01:50):
Hello.
Speaker 5 (00:01:52):
Ted. I'm one of the co-founders of this project. I've been around since the beginning of the journey and a little bit before that even. And yeah, I work here at Grafana Labs.
Speaker 6 (00:02:03):
And I am Marylia, also from Grafana, but in OpenTelemetry. I am a maintainer, approver, and triager in a few different groups. So JavaScript SDK, communications, contributor experience, Semantic Conventions database and the Ecosystem Explorer.
Speaker 1 (00:02:24):
Awesome. And today this is actually kind of like, I feel like this should be a celebratory thing. We should have some balloons or wine or something, champagne because we're here to celebrate the fact that OpenTelemetry is finally graduating. Ted, can you tell us what that actually means and why is it that graduation terms like graduation are used? It kind of seems like we're graduating from college or getting some sort of diploma. Why is that?
Speaker 5 (00:02:57):
Yeah, it means OpenTelemetry's got to go get a job now and find an apartment and all of that good stuff. So the way the CNCF works, so OpenTelemetry is part of the Cloud Native Computing Foundation, which is itself part of the Linux Foundation. And that's really important for open source projects because you are trying to create something like an industry standard. You have lots of different people from lots of different organizations working on it. So you want to have an organization that houses the project that is sort of like neutral territory but also provides a lot of services for all these different open source projects. And for us, that's the CNCF. And the CNCF likes to take projects in when they're very small and kind of grow them all the way through their journey as opposed to taking on gigantic preexisting projects. That's kind of the general model that they follow.
(00:03:53):
So they have stages of development that projects go through and the kind of services and things that they provide to these projects depend on the stage that they're at. And there's three main stages. The first stage is called Sandbox, which is just it's a very new small project. Maybe it will make it, maybe it won't. We're just more of an experiment and things graduate from the sandbox to incubating. So they go through a review and they become an incubating project, which means they're actually a big deal. They're starting to get traction, but they haven't crossed the chasm yet if people are familiar with that term. So there's early adopters, people are using it, but it hasn't reached large scale mainstream adoption. And graduation is an acknowledgement that a project has in fact crossed the chasm. It is now a very large mainstream project. It's received broad adoption in production at lots of different organizations.
(00:04:57):
And at each stage we get a review and we get feedback and all of this stuff and graduation's kind of the final step and we've now crossed that finish line.
Speaker 1 (00:05:08):
Okay. And what does that mean for OpenTelemetry? What did you have to do? Why would it take a while? I mean, it actually took not much time at all for OpenTelemetry to graduate. I think it was like the second fastest ones after Kubernetes or something or ...
Speaker 6 (00:05:31):
Depends on what you consider fast. If you want to get a sense ... So we started as Sandbox in 2019, become incubated in 2021 and now on 2026 is when it graduated. But the process also of the graduation itself, it took two years or something.
(00:05:56):
So around that. So it's not a short process, I would say, just because it has a lot of how to get feedback from user, they have to do due diligence. So it is a long process and OpenTelemetry is a huge project. So there is a lot to ... And even if you say like, okay, I'm going to look everything today. By the time you start looking, more things are created, more ripples are being added, more things are kept adding. So it just keep increasing the amount of things that can get reviewed. So it is actually a quite comprehensive long process.
Speaker 1 (00:06:34):
I mean, I think that sounds just like the ... I think that college metaphor is pretty apt then because it takes a while, but after that, there's a lot of authority that's attached to that diploma. I feel like it wasn't that long ago when at conferences people would ask me, "So this OpenTelemetry thing, should we adopt it? " And now it's like, no, it's graduated. I feel like it is the defacto observability standard. Do you think that that's what that is saying as well?
Speaker 5 (00:07:09):
I think it's fair to say that that's been achieved. We still have a roadmap. We still have new things to do, but we also still have some cleanup to do. But signaling to people that it's ready for production, even though it already has a big user base, we have seen this with other CNCF projects in the past, which is when you announce that it's graduated, there are a bunch of people who are sort of sitting on the edge of the pool who use that as their queue to jump in. So we kind of do expect to see a bump in user adoption as a result of this, but also expectations coming along with that, that the project is completely mature and broadly applicable, which is a lot of work. As Marylia said, the project has a huge surface area. That's one of the reasons why it took eight years to get to graduation and why the whole graduation review process itself was so long.
Speaker 6 (00:08:13):
It also depends on the company people are like, "Okay, I'm going to just start this, my own application from home." They control everything, but the more you get companies like Enterprise, they have their own rules of things that they can use or not. And we do know there are a few things that say we cannot use something that is not marked stable, is not marked graduated. So that is also a signal that like, oh yes, we have people looking into everything. We have people working basically to guarante that things will work and you can see more adoption even by things amount of contributors that keep increasing or even like Tiffany probably see these like all the communication, the amount of people posting blogs because they want to share how they are using and things like that on how to adopt. So you can also see that aspect as well of users.
Speaker 4 (00:09:10):
Yeah. I was going to just piggyback on the fact that OpenTelemetry is such a vast project and that we're expecting growth in both users and contributors. To point out, Nicole, that stat that you had that OTel is second behind Kubernetes, I think that actually refers to the fact that we are the second biggest project in all of CNCF now behind Kubernetes. So it is a very exciting time, but it's also quite hard to keep track of everything that's going on.
Speaker 2 (00:09:42):
It's also crazy in general looking at the land because like several years back, you would look at the landscape and you're like, "Oh, this isn't so crazy." And now you look at landscape and you're like, "What?" There's so much stuff. There's just so much stuff there.
Speaker 1 (00:09:56):
Yeah, it's actually, I was looking at the stats and it said 12,000 contributors and 2,800 plus companies that are using OpenTelemetry publicly because I'm sure lots of people are using it and just not saying it. That's really a lot. What is ... Go on, Ted.
Speaker 5 (00:10:18):
Oh, I was just going to say it's the second largest project in the CNCF, but it's also when the CNCF does its kind of landscapes, like what's out there in the world, it's in the top 30 of largest open source projects out there period. So it's pretty wild.
Speaker 1 (00:10:37):
And I think you mentioned that the diversity of the contributors might have is also a criterion for graduation. Is that right?
Speaker 5 (00:10:51):
There's diversity from the perspective of vendor neutrality. That's something we care quite a bit about in OpenTelemetry is that a project we care about vendor neutrality from a lot of different angles with OpenTelemetry that's part of trying to be a standard. But on important one is does the project have some critical dependency on one company where if one company went down or went away or lost interest in the project, would that result in the plug getting pulled in terms of the people who work on it? So OpenTelemetry's actually been rated a very healthy project from that perspective in that it actually has a large number of different organizations paying people to work on it full-time.
Speaker 6 (00:11:40):
And if you want to have a sense, so for example, we have different status of triager, approver, and maintainer that people are helping maintain because as I mentioned, there's a lot of different repos and a lot of things to keep track. So we do have people that help with that. We are around 300 people that have any status today. So it's a lot of people and we also try to make sure that it is diverse. So we don't want to have any of those groups that the majority is like a single company is the one maintaining the whole thing just because we want to keep neutral.
Speaker 2 (00:12:17):
How hard is it to go in the direction of getting to where you eventually can get status?
Speaker 5 (00:12:24):
Interesting question.
Speaker 6 (00:12:26):
So it's going to really depend a lot on the different ... So SIG is called like special interest group. So it's almost a one-to-one SIG per repo. It's not exactly, but it's like an easy way to think about it. So you start by like, okay, I want to start contributing. So the first status that you can get is triager. So is a person that can be the first barrier or like, "Okay, is a lot of issues coming up? Are those issues actually valid? Do they have the enough information that people can actually work on that? " So you can help out with this thing and then they can get to triager status. Now if they're really like helping out with reviews, providing like feedback, their opinion, creating their own things, that is more what like an approver would be. And if it's something that person that wants to think more like the roadmap, what are the priorities that we should have or also help maintain the health of that SIG.
(00:13:27):
So making sure that they are bringing other triagers, approvers and maintainers, that would be like the maintainer. So how long it takes, it can vary a lot because for example, if you look at a SIG that has tons of people contributing, you work sometimes like on so many that is hard to notice so you have to be like very visible, but there are six that really need help. So if you want to like go to those one and say like, you can help, you would more easily get a status if that is your goal because your impact would be like so much larger just by helping that small group. So if anyone is considering helping, please help the ones that need. I think like Tiffany would have a call out here for comms.
Speaker 4 (00:14:15):
Yeah. I mean, I would add on to that, that definitely look for the small sigs that are busy and need help, but there are large sigs that have lots of people and still need help. So my recommendation for someone who wants to earn a status in OpenTelemetry is find the area that you have knowledge in. So whatever that might be, whether it's language sig or collector or communications and then start reviewing other people's PRs. That is a huge help to the existing maintainers and approvers. We obviously welcome new contributions and new PRs, but we have a backlog of existing PRs and we can really use the help with approvers. So if you want to get noticed fast, my recommendation is find your subject matter area and start reviewing PRs there. That would be my thing. And then Marylia's comment that if you are into blogs or into writing and reviewing and social media and that kind of stuff, comms could really use help in that area because we've seen an influx of blog contributors just in the last like four or five months and we can really use help there.
Speaker 6 (00:15:36):
Oh, so one thing that I want to point out, if you want to be a contributor, we are looking for people that we can trust and kind of like make it grow on those different status. So if we're coming with a lot of AI that you don't know how to explain what is actually being done, or if you're creating a bunch of blog posts that is clearly like AI generated, I'm already telling you, you're not going to get a status because if it is just using AI, it is not you, right? It's AI. We want the person, we want a contributor, we want to be able to have somebody that will become eventually a maintainer that would help to make the project row. So that is what you're looking for. We are not looking for quantity, we are looking for quality.
Speaker 5 (00:16:23):
And I would really emphasize something Tiffany just said, which is it's like helping other people. Sometimes I've seen people come in and they've been active and then they come to me later like, "Hey, how come I want to be an approver or maintainer? Why am I not getting recognized for that? " And then when we look at what they're doing, they're mostly filing their own issues and their own PRs and asking people to review their stuff and prioritize the things that they want to work on. And of course we want people to express that kind of interest, but really the thing about being an approver and a maintainer is about working on other people's stuff and working on the priorities of the working group. So that's really the fast track. The more you're going to the maintainers of a SIG and saying, "Hey, what do you need help with?
(00:17:12):
What can I take off your plate?" That's the stuff that really gets people's attention.
Speaker 1 (00:17:19):
And Tiffany H, I'm just going to say H and J right now because otherwise you'll both want to answer. I think that part of the process of graduating is actually the documentation, right? Can you tell us a little bit more about what the requirements were around that? I mean, is there like a hard requirement or just there should be some or how was that?
Speaker 4 (00:17:47):
I mean, Ted and Marylia might actually know more about the specifics, but from not being directly involved in the discussions about graduation, what I saw was we needed to make sure that the spec and semantic conventions were fully documented and we needed to make sure that the collector component documentation was easily accessible. I think that was one of the main points that we heard. It's like an evergreen problem in OpenTelemetry that we have over a hundred repositories and most of the component specific documentation lives with the code and we want to keep it that way because it's more likely that the engineers will keep it updated as they're writing the code, they update the documentation. The problem is findability. It's really difficult to find that one ReadMe that you're looking for if it's buried in a GitHub repository. So that is an especially big problem with the collector, which has hundreds of components, be they receivers, processors, exporters.
(00:19:01):
And all of the component specific documentation lives in GitHub. So one of the things we did, and it's kind of just a stop over is we created a list of those components with some basic metadata on the OpenTelemetry website and it's automatically updated, which yay for automation because keeping the website up to date would be very difficult with all the changes. And then we link out to that ReadMe that they need to find. But that actually dovetails into a great ongoing, very exciting project in OpenTelemetry, which Ted and Marylia might be able to talk more about, but we have something called the Ecosystem Explorer, which is hopefully going to marry all of this documentation, all of this metadata into a really user-friendly website that people can use to find instrumentation details, versioning details, and also components.
Speaker 2 (00:20:09):
And then I have a question based on what you were saying, could you explain to people a litle bit if they aren't aware what semantic convention actually means?
Speaker 4 (00:20:18):
Oh, that's not a question for me.
Speaker 5 (00:20:24):
That term, I blame Google for that gigantic word. So really what semantic conventions mean is OpenTelemetry actually has a couple different layers of how we specify things. There's a specification for how a component should work that we have to write across a lot of different languages. So when people talk about the OpenTelemetry specification, they're usually talking about that. So we have SDKs and APIs and we need to implement them in Ruby and Node.js and Go and Java and. NET. And so we write a spec to make sure that all of those things work the same way in all the different languages. We want them to have language specific idioms, but they all need to have the same functionality so that when you put the whole thing together, it's one big coherent system. So if we have to write code over and over again, then we put it in the spec.
(00:21:27):
Then there's the data protocols. OpenTelemetry through the collector tries to design its data to be something that can be translated into the major protocols out there. So we have our own protocol for metrics, but we also want to make sure we work well with Prometheus for example, but we have our own data protocol called OTLP. So OTLP is a definition for how we actually send the data on the wire, like what is the structure for conveying that data, but then there's the actual data itself that we're sending and we want that data to be normalized as well. In other words, if you are making an HTTP request and that's recorded, no matter what instrumentation library or what language or component is emitting that HTTP request and we're recording it, we want that data to be represented the same way so that when you're analyzing this data, it's simple.
(00:22:33):
And this is a thing that we had noticed from, we've all been working in observability for a long time and that was kind of like the one piece projects often didn't do was like you would have these subtle difference between like how is an HTTP status code represented? Is it a number in one field and then the name of the status code in another field? Is it like all caps, all lowercase, camel case, all of these little details. And if you can get them very uniform, especially using AI machine learning tools that are automatically analyzing the data, it really helps to have it exactly the same. So that layer of schema we call semantic conventions and we call it that because that was the word Google was using internally, probably because people were having a nerd slap pipe about whether this is a schema or not.
(00:23:30):
And so they settled on semantic conventions as a term to sort of make everybody happy.
Speaker 1 (00:23:37):
I also want to kind of shout out this previous episode that we did with Jay DeLuca. We had him talking a little bit about the whole episode was about auto instrumentation, but he did talk about the ecosystem explorer in there as well. While we're on the subject of semantic conventions, I know that there are many semantic conventions and not all of them have the status of stable yet. Some are still experimental. Oh, hi, Jay is actually here. Hi or whoever wants to answer, was there also like a requirement for a certain number or certain semantic conventions to already be stable or was that not part of the graduation process?
Speaker 5 (00:24:29):
So I can take that one. When it comes to stability, the feedback that we got, it's true there's semantic conventions that need to be marked as stable. And the idea there is we don't want to go breaking people's dashboards all the time by changing the data. In fact, in general, we're really concerned about stability in OpenTelemetry and not breaking things to the point that we had actually overclocked and sort of overemphasized going to 1.0 and marking things as 1.0. And this was actually feedback that we got from the CNCF as part of graduation, which was saying that, "Hey, we appreciate that you don't want to break people's stuff, but because you're so afraid to issue a 2.0 on things, you're camping out on 0.x for lots of stuff." And that's actually sending the wrong signal to users because you aren't saying that this component is dangerous to use in production, like it's 0.9, so don't use it in production because it might blow up or be harmful.
(00:25:44):
Actually, we have lots of components that are totally stable from a code perspective. They're completely safe to use in production, but because the data that they emit might change because the semantic conventions, we might do another pass at them, we were holding back on marking those as stable. So it wasn't actually the semantic conventions themselves that were the problem, but all of the instrumentation packages that use those semantic conventions were also marked as still being in beta as long as the convention wasn't marked as stable. So a change we made as a result of this graduation feedback is to go back through all of those instrumentation components and if they're maintained and useful in production, go ahead and mark it as 1.0 with the current data and if the data changes in the future, mark that as like a 2.0. So people know that it's a breaking change, but it's okay to go to 2.0 and 3.0 and things like that.
(00:26:47):
That will unblock some users, as Marylia were saying, who are not allowed to roll anything out that's still marked as beta. We have some users like that, but also just being on the same page as our users as to what those version numbers actually mean. Yeah.
Speaker 6 (00:27:08):
I also want to talk a little about the semantic convention because sometimes we hear the feedback like, "Oh, it's just like selecting what is the name of the metric." That's simple. It's like, "Have you ever got a bunch of engineers together at the same time?" And they say, "You just have to pick a name." So I can give an example for when we're doing the database. That took over a year because you have a group and we try to get people from different that have dollars on different databases and going to say, "Okay, the name of the metric. Okay, one of the attributes is schema." Oh yeah, makes sense Does everybody or every database have a schema? No, that does not make sense for all databases. No, no, let's call the other name. Oh, but that name doesn't work. Oh, okay. Now we define it like this is a very generic name that we work for all of them.
(00:27:57):
So part of it is actually you have to go to, for example, the SDKs and implement, do the proof of concept. So doing this, you realize, oh, that is not available at all on this particular language, so we cannot make a requirement because we are never going to be able to send this for this type of database. So okay, it goes back. So part of making the semantic convention stable is defining this, making at least three implementations on different languages, going back saying, "Okay, this is working. Now we can mark as stable and other people can use." But if you have, for example, the first proof of concept, we did several names and then we realized, oh, that's not going to work out a few others, so you have to change. So this is why the version, it keeps changing and might break because we are still defining the name.
(00:28:46):
So that is one of the challenges of just marking things this table.
Speaker 1 (00:28:52):
I mean, do you know that XKCD meme? I actually went and found it. It's like now there are 14 competing standards and then at the end of it, we need to develop something that's universal. Now there are 15 open standards, competing standards. But it's actually funny because OpenTelemetry's history is kind of the merging of different standards, like OpenTracing and OpenCensus work were competing. They were different standards and they merged.
Speaker 5 (00:29:22):
Yes. And then we also merged the semantic conventions with the Elastic Common Schema. So when people invoke 927, I often am like, 927 has no power here because we're actually at negative two standards at this point. So we do a really good job on this front. And part of that is this process, it sounds like a bunch of nerds bike shedding, but I think one of the reasons why OpenTelemetry has had staying power is because there's this tendency to just be like, ah, we're just going to pick a thing and move on or we have a really high-handed opinion, but we try to really listen to the community. We try to listen to all of the existing things that are out there and even though it can make those debates get drawn out, when we do finally resolve one of these things, it's gotten a huge amount of requirements gathering and community input put into it.
(00:30:25):
And I think that's really important. And if you're trying to build a standard, that's the kind of work that you need to do. That's in general, when you look at the IETF and the W3C and groups like that, that's why they tend to move slower than open source projects where people are just breaking things all the time and changing them. It's the requirements gathering part of the work takes time.
Speaker 1 (00:30:49):
I really loved what you said in the GrafanaCON talk, Ted, where you were like, the update is very boring and we want it to be boring. That's actually what we want from an open standard. We don't actually want to be held to different standards all the time.
Speaker 5 (00:31:10):
That's actually when we had to name the project, that was the first official bike shed of OpenTelemetry was when we merged OpenTracing and OpenCensus, what were we going to call this new project? And there was some desire to have a really cool name, what I call the Pikachu Pokemon naming conventions, where we just come up with some cool random word. But we settled on OpenTelemetry specifically because it's very boring and describes what it does. And actually everything we name in the project, we try to just name it what it does. So if people read the name, they know what it does. They don't have to go look up something else. And just in general with telemetry, it can either be boring or it can be frustrating. It's hard for this stuff to be interesting. We want it to just be like air or water.
(00:32:08):
Software just describes what it's doing and you don't have to put any effort into making it describe what it's doing or any effort into analyzing that data. We want it all to just work. And in a sense, that means making it as boring as possible.
Speaker 2 (00:32:24):
So it means people can't play Pokemon name or medicine name or open standard in observability.
Speaker 5 (00:32:33):
Also,
Speaker 1 (00:32:33):
Pikachu is not arbitrary,
Speaker 5 (00:32:36):
Ted. What's a blue orchid in the concept of computers? You have to go look up what that thing does now. Whereas if we call it OpenTelemetry eBPF instrumentation, well, you know what that thing does. So there you go.
Speaker 1 (00:32:52):
Also, I just want to say Pikachu is not an arbitrary name. Chu is the sound that a mouse makes in Japanese and Pico is because it's electric. It's an electric type Pokemon and that's the kind of sparkling, sparking thing. Just saying.
Speaker 4 (00:33:11):
Sounds like we need to call on Pikachu names.
Speaker 1 (00:33:15):
Yeah, on Pokemon names entirely. So we talked about semantic conventions. What about the stability of the telemetry signals? Was that also part of the graduation process?
Speaker 5 (00:33:30):
That's sort of what kicked it off. We had this original goal of the project, which was tracing metrics and logs. There's lots of other things that we've added to the project since then. There's eBPF, as we mentioned, there's profiling, there's lots of different things we can do, but the original goal was tracing metrics and logs in the set of sort of common broadly used server side languages. So there's also mobile and browser and all this other stuff, but getting tracing metrics and logs all working, meaning that we have APIs for writing instrumentation, we have SDKs for sending that instrumentation and we have instrumentation that covers enough stuff in the world that if you have a big heterogeneous software system that uses Go and Node.js and Ruby and. NET and Java and all this stuff, you can install OpenTelemetry and if your system's based off of common open source software libraries, you'll get enough telemetry out of the whole thing that you're not having to go back in and write a bunch of instrumentation by hand just to get coverage.
(00:34:46):
So that was the original goal of the project. And what kicked off graduation was us saying like, "Hey, we've arrived at that
Speaker 2 (00:34:56):
Goal." And then so you were mentioning that was metrics, logs, traces. And then in I guess March at KubeCon profiles also went into Alpha. So did it having profiles being there at all make any effect on it or was it just kind of like, "Okay, if we have metrics log traces at a certain point, then we can graduate and let's see where we go with profiles as
Speaker 5 (00:35:20):
Well." Yeah. When we discussed all this with the CNCF, I mean, profiling, all these things are great and we're excited about it, but we said from the perspective of evaluating us for graduation, all of these things are out of scope because we're always going to be moving the frontier and adding more things. So there's always going to be some part of OpenTelemetry that's in alpha or beta. It's never going to be like it's a hundred percent stable. So in terms of like, is it stable, is it broadly adopted all of that is traces, metrics and logs for these common languages that we wanted-
Speaker 2 (00:35:58):
Yeah. What I was more so wondering wasn't so much stability, but as opposed to just was curious whether it hitting alpha was a gate at all, but it sounds like it's just a separate-
Speaker 5 (00:36:07):
We just said, yeah, ignore that. Just pay no attention to the profiling behind the curtain.
Speaker 6 (00:36:13):
Yeah. The same, if you think about it, the process took from starting the graduation process was two years. So how was two years ago? Because we have another signal is also baggage. Those things were not in the same state that is today was two years ago. So this one when started were like, look at those three and those things will come up. So things keep evolving, like new things keep getting created. So it is hard to say, "Okay, now also look at this one that we just created this month." No, no, no, also look at this one. So I think this is why it was like, look at those three and the extra our bonus is just like anything graduated. Well, if you took the comparison, you graduated from college, you still can do extra courses and after you graduated, you do have your masters, you have whatever.
(00:37:06):
So those are signals, they're just on top of that basic thing that we define it. So it just like add-ons. I
Speaker 2 (00:37:14):
Got to say, this is the first time I've ever actually heard of baggage. Well, this kind of baggage, I've heard of in other terms, of course, but I didn't know that that was one of the signals or what it even meant.
Speaker 5 (00:37:26):
Baggage has been around since the beginning, but it's never really the point of baggage actually is not people are starting where we're talking about baggage now we're talking about using it in the context of telemetry, but the original point of baggage is one of the reason why tracing was never really broadly adopted is it's so much work to build a tracing system that propagates a context. So to make any kind of tracing works, you have a little bag of attributes, your trace ID, your span ID, some other things and you need to be able to have that follow the flow of execution through not just your computer process, but then whenever there's a network call, that information has to get serialized somewhere and sent to the next hop and then deserialized and put into a thread local or somewhere where it gets propagated along. And that's a ton of work and that's actually what, even though tracing is really useful context for logging and other things, people didn't really adopt it because it was just so hard to build that from scratch and get it out there.
(00:38:40):
So the point of baggage was saying, "Hey, now that we've done all the work to build this context propagation system, are there other kinds of crosscutting concerns that might want to use this other than just telemetry?" So security is like the other big example of like people might have security rules or information about is this data tainted or clean or something like that. And baggage was a way to say, "Hey, you don't need to rebuild this whole system from scratch. If you're just trying to use our context propagation system for a different purpose other than telemetry, you could build that on top of baggage. Feature flags is another example of a system that could make use of baggage. So that's kind of what it's there for. But yeah, it's like a little bit weird, right? It's kind of a secondary use case. So it just sort of sat dusty in the corner for a really long time while we were trying to finish everything else up.
Speaker 2 (00:39:37):
I'm just thinking right my luggage sitting in my bedroom that's dust all over. Sorry, Nicole.
Speaker 1 (00:39:44):
No, that's all right. I was going to say, so we've been talking about the feature requirement for graduation, which is like the ... We've talked about the semantic conventions and also the different telemetry signals and those three that you mentioned, Ted, they're all stable now, including baggage actually. We also talked about the diversity of contributors in the sense of having it not just one or a handful of companies that are doing it so that it's really vendor neutral. And then Marylia talked about the governance part of it, like having a kind of broader group of maintainers with different status as well. What about adoption? How do we measure that for an open source project? How do we measure how widely adopted it is?
Speaker 5 (00:40:37):
That's tough. Did anyone else have thoughts? I feel like I've been talking too much.
Speaker 1 (00:40:45):
Well, I know that Imma has been working a lot on translating a lot of the documentation into Spanish. I mean, I think documentation really certainly facilitates adoption, right? Because if you're not an English speaker, the English speaking internet is shut off to you, and that goes for open source projects.
Speaker 3 (00:41:05):
I think that's good for accessibility in general because other countries where English is not the main language. We think that doing translations is very good to allow anyone to access the documentation. Absolutely. And for me, that was one of the easiest way to actually, if you want to contribute, if you speak one of those languages that are being translated, it's probably the easiest way to get into OpenTelemetry. It forces you to read the documentation, to understand, to translate. And it's a great ... For example, in Spanish, I sometimes get people asking, how can I join the group and do Spanish translations? And we always welcome people who translate. It's very useful for the countries who don't have English as first language.
Speaker 4 (00:41:47):
Yeah. I don't even know how many we have now, but I think we have closing in on 10 different localization teams. So it's definitely a great way to get involved. Nicole, about your question about measuring usage, I don't have an answer for that, but there is a new initiative going on right now designed to draw more end users into the project and also help people adopt OpenTelemetry in very complex production environments. And that project is called blueprints. So we've published our first blueprint and we have several more being reviewed and in the pipeline right now, you can check those out, but they're basically tightly scoped documents that go from what is your challenge that you're trying to addres? What are the guidelines that the project recommends for addressing those challenges? And then how do you implement those guidelines in this specific environment? So we are looking for people who have implemented OpenTelemetry, who work with customers who are implementing OpenTelemetry to get involved in the project and help us write these because you know best.
(00:43:19):
But also for end users who are looking for more guidance than just strict technical documentation can provide, these blueprints will hopefully help.
Speaker 6 (00:43:30):
And also we do have a page of adopters of OpenTelemetry. Of course, people they use probably the majority will not put their name there. So we do have some sense of how many people, but yeah, it's probably the majority of people just don't add themselves there.
Speaker 2 (00:43:49):
I have a completely unrelated question. So earlier, Marylia, you were talking about people contributing AI slop and things like that, which we obviously don't want. AI is becoming more and more talked about popular, et cetera. Can someone talk a bit more about how AI is in the good sense in OTel? There's the GenAI semantic conventions that I don't know what state it's in and things like that. Yeah, someone can talk about that more, maybe Ted or someone else. I don't know.
Speaker 5 (00:44:23):
Yeah. So I would say there's three aspects of how AI is affecting the project and two of them are very positive and one of them is a pain in the butt. The two positive ones are all of these agentic AI systems consume a lot of resources. They can get slow, they can fail. We're trying to debug them and understand what they're doing. And so we need telemetry out of these systems. And since there's been an explosion of interest in deploying and managing GenAI and Agentic AI, there's been a rapid growth of the GenAI SIG within OpenTelemetry, which is to very rapidly try to come up with standards and instrumentation and all of the stuff you need so that people who are deploying and managing these systems can do it the OpenTelemetry way. So that's been very positive. There's a whole crop of new startups that are focused on AI observability, not using AI to observe things.
(00:45:30):
I mean, that's also a thing everybody's doing, but saying we're focused on observing AI systems. So that's been a big growth area within OpenTelemetry. So that's very positive. I think that's a place where we can really make a difference in the world. Another place where it's very positive is maintainers. Everyone working on OpenTelemetry is starting to use AI coding to assist them and try to move faster. So we don't have a big organized effort, but all the different SIGs are starting to build coding skills and things like that that are OpenTelemetry focused. And I have a lot of hope that we can leverage this specifically for instrumentation. The one place where we would love more contributors and we need a lot more help is managing all of this instrumentation. We have the semantic conventions, but then we have all these instrumentation packages that are out there and that's actually the biggest surface area.
(00:46:32):
And we have some tools like Weaver that we're working on that are not AI tools but could help AI tools be more accurate in the code that they write. And this is really an area we'd love to focus on in the next year is like, how can we use tools like Weaver and testing harnesses to constrain the problem space enough that AI coding tools can be really accurate and help us write go from ... We already write specs and we see this as a pattern in AI coding is like you write the spec and if you've written the spec in enough detail, you should be able to use coding tools to then implement the spec. So we really want to see that happen. That's all very exciting where AI has been a huge pain in the butt is not just OpenTelemetry, but every open source project out there has been hit by just an enormous amount of noise, just very noisy activity.
(00:47:36):
You didn't need AI to be like, "You know what I'm going to do? I'm going to write a script that just scans every repo on GitHub and opens a pull request to correct every single misspelled word that I find." So one pull request per mispelled word across all of GitHub. You didn't need AI to do that, but you didn't do that because your brain just slammed the door on those intrusive thoughts and said, "Don't do that, Ted. They'll run you out of GitHub town if you do something like that because that's really annoying for maintainers to have to paw through that stuff." So there's some stuff that's genuinely malicious, like people trying to train up these bot accounts so that maybe they can use them for supply chain attacks and stuff like that. But even the people trying to maybe be productive from some sense, they're not really thinking about how their work is impacting the maintainers and the people who have to process all of the noise coming in.
(00:48:37):
And it almost feels like maintainers are starting to have to become Reddit mods with 30-day mute buttons and all of this other stuf. So that's a struggle we're trying to figure out along with all the other open source projects.
Speaker 6 (00:48:54):
Yeah. I think one word is just like when you're contributing to something, just think about empathy. There is an actual person on the other side. So if you create 10 PRs and keep being in there like, "Hey, why's the merge? Why is that merge? Where's the merge? Did you help review other PRs or is the quality of actually the thing that you created? It is good. Can you explain?" So that is why keep this in mind. We actually have one issue open that one idea that is had is limit the amount of PRs by people that don't have right access just to help control a little this amount so we can probably limit to three or five PRs per person at a time. So this is a way we don't get super flooded as well because remember maintainers are people too.
Speaker 1 (00:49:45):
So we're already talking about one of the things that will change the OpenTelemetry landscape, which is AI. What are some other things that you can see happening or that you would like to see happen post-graduation?
Speaker 5 (00:50:02):
Oh, geez. So we've got a post-graduation roadmap going and this is all of the things that are kind of like the boring stuff, the follow-up to graduation, what do we need? There's all this new, exciting stuff like profiling and mobile and all of these new things, but we still want to make sure we've completed everything we need for all of the original goals and that can kind of be divided into a couple categories. One is stability as we mentioned before, just going through and making sure everything has been marked as stable and brought up to the latest definitions of semantic conventions and things like that. But there's also deployment. We want OpenTelemetry to be easy to deploy everywhere all at once by someone who's more like an operator. A lot of organizations, especially at big companies, have an infrastructure team or an observability team, some centralized team whose job it is to manage this sort of stuff.
(00:51:08):
And we have tools like the Kubernetes operator and helm charts and things like that. But outside of Kubernetes, we didn't have a way to just sort of automatically install OpenTelemetry and have it automatically auto instrument everything. And even the Kubernetes stuff we have, you still kind of have to go service by service and sprinkle a little bit of configuration in there. So one big effort right now is to improve all of that so that you can just be like app, get installed, OpenTelemetry and boom, it just works for everything. Or if you're using the operator, it just will by default be able to auto-instrument everything and every language that it finds and then you're kind of pulling it back from there. So those are kind of like the two big pieces. And then there's also just security, improving how we actually do project management, improving self-observability of OpenTelemetry.
(00:52:10):
Ironically, there's a bunch of ways if your SDK or the collector is dropping spans or dropping data because it's overloaded, we could have some improvements there. So all of those little loose ends we're trying to tie up and we have a community roadmap doc that we're working on right now about that. So that's important along with all of the new stuff that people want to focus on.
Speaker 6 (00:52:35):
Yeah. I want to bring that ... So a lot of the things we started with creating, creating, creating, creating a bunch of things, but how we can make all of this actually stable or easy to use. So a lot of the conversations now is like how we make this more easy for if a person right now, if somebody never heard about OpenTelemetry, it's all like, okay, put on an OpenTelemetry today. We'll be like, "What is a collector? Do I need this? What is the SDK? What does it do compared to whatever?" So it's a lot of things I think is the easy to use and one other, I'll call it project as well that also became stable recently is the declarative configuration. So this is a way because if you're going to initialize an SDK, sometimes you have to do things manually. You have to start your own tracer, you have to start your own things and it can of course vary by language.
(00:53:30):
So if you're working on a company, half of the project is in JavaScript, half is like part of it's in Java, then we have a few and done that. So you have to learn how to initialize everything, all of them. It takes time, but now within the declarative configuration, it's just a single YAML file that can be the same for all of them. So you just say like, what are the values that you want? So like the size of your batch, your endpoint and things like that. So you can use this and the person that is maintaining the application. So you can have the person that is on the operation side can just have this and only make updates on that file. They don't need to see your actual application and it's just pointing to that and it's going to work the same for all languages.
(00:54:18):
So that one became stable recently and we do have now SDKs actually implemented. So it's not available for all of them. We have for some of them in different levels of stabilization. Well, not stabilization of like feature availability pretty much, but that is also one of the goals.
Speaker 1 (00:54:40):
Yeah. I wonder if we could also, just to sort of wrap this up, maybe we can each talk about how we first got started in OpenTelemetry and also if you were starting now, what would you do? Maybe let's start with Ted because you started it.
Speaker 5 (00:55:00):
Yeah. So I come from the OpenTracing side of the family and I got interested just because I got frustrated with having to observe these large scale systems that we were maintaining. I've been working in distributed computing for a really long time, but I was specifically working on a project called Cloud Foundry, which is sort of like Kubernetes and also very similar to, it's like open source Heroku is the easiest way of describing how CloudFoundry worked. And so I wrote the container scheduling system for that or helped design it and we would have to debug this gigantic important operating system for very, very big end users and we wouldn't even be operating it. They would be operating it and they'd be like, "We have a problem." We'd be like, "Cool, well send us what data you have. " And then we get five terabytes of logs and we're having to paw through all of this.
(00:56:04):
And I really started to feel like this could be better. If we were to design observability and telemetry from scratch today, it wouldn't be the way that we're currently doing it. The whole three pillars thing was really more like, this is just how humans kind of tack more things on to the stuff that they already have. It's not like a coherent way of designing a telemetry system where all of the data's connected into a graph that you could use computers to analyze. So I got very interested in that and since I love working on open source software, got very interested in the tracing side because that's sort of like the missing glue. We had these other pieces, but the tracing to me is like the glue that kind of binds the other stuff together. So that's how I got interested in it and it turned out a bunch of other people were interested and it was just kind of like the right time for a technology wave to come through and to sort of standardize.
(00:57:08):
Okay.
Speaker 1 (00:57:09):
Well, I can go next because mine's very different. I came across it because I love all things open source, but I also, there's so much that's out there and it's really impossible to know everything about everything. I got bought in by the promise of OpenTelemetry being like you just instrumented and you don't actually have to care about every single metric or whatever because I don't know what I really want to know. I just want to do the right thing, whatever that means and I'm perfectly happy to get everybody else's best practices. And how I got started was Grafana is an engineering heavy company and so at the time Juraci Kröhling was working with us, he's still a friend and I asked him onto a live stream and I got to ask him all my very beginner questions like, "What is OpenTelemetry? How does it work?
(00:58:06):
How to instrument?" And I found out that people actually like beginner content like that. I'm not the only one who didn't know what all that stuff was. So if I were getting started now with OpenTelemetry, I would really recommend and encourage people who feel like they're beginners to just document their journey because there are always people who are not as far along that path as you, and I think it will still help. I'm going to popcorn it over to Imma.
Speaker 3 (00:58:41):
In my case, I was working at previous company, Elastic and I was working with customers and one of the pain points was actually naming things and for me, semantic conventions was lifesaving. At the time it take elastic common schema. And then my conversations with customers were so easy because that point was like, "Just use this. Leave me alone. Just use this. Let's move to Netflix." And then when I saw that this was donated to OpenTelemetry, I got very interested in OpenTelemetry and since I was already in the cloud native organizer and I'm really into CNCF, I thought this is probably the best project for me to contribute to. If I had to start now, I'd probably go the same route I did to start with the documentation in Spanish. It's a very nice group, very welcoming and this is for me great because it forces me to read the documentation and understand everything that's in there.
(00:59:34):
So that's my recommendation. Maybe now Marylia,
Speaker 6 (00:59:43):
For me, the love started with observability first. I work in several different things and then on prior company when I started working on actual observability, I was like, "Oh, that is the thing that I like. " It took me 10 years of trying out different things, but I was like, "Oh, this This is the thing that I like. So I decided I want to work at a company that the goal is observability. And I had worked a little with OpenTelemetry before, but it was the end user side pretty much. We were emitting metrics and whoever wanted to use. So when I joined Grafana, I actually interviewed for a few different teams, but I really liked mine. There was OpenTelemetry indicator. I was like, "Oh, I really like this. I already had worked with Aleta before." So I just joined and then I just keep finding things that I'm interested.
(01:00:31):
So this is why I mentioned I have status in several things is because I started with the JavaScript. So I was like, "Okay, we needed this feature." And I started, I was like, "Oh, but I knew this group of Brazilians that weren't happening with documentation." I was like, "Okay, let me help with the documentation in Portuguese." And they're like, "Oh, we're going to start a semantic conventions for database." And my prior company, I was working observability for the database. So I was like, "Okay, I have experienced that. " So I joined that team and then like, oh, but we also need the helpful, because one thing that I really wanted to help new contributors. So I was like, "Let's start a new SIG about contributor experience." So I help out with that one. And it was just one thing that you find so interesting, I keep adding and I was like, okay, now I also am part of the governance committee, a member of that one because I was like, oh, I want to help out on that level as well.
(01:01:21):
So I think it's just a project that is so huge that you're going to find something that fits you because there's a lot of option. I just happen to enjoy a lot of those options. So this is why I'm involved in tons of stuff.
(01:01:39):
Yeah. Well, I was going to say, now it's Tiffany and now I'm going to see which one is going to start first.
Speaker 2 (01:01:47):
I guess I can. Yeah. So when I first switched from hardware engineering to software engineering, I joined a software team that was working on a project called Snap, which was an open-source telemetry framework that doesn't exist anymore, but that's what got me to first know about Prometheus and Grafana. And then went away from that to an extent, went into Docker Kubernetes and that was a thing. And then I met my now good friend, Matthias Haeussler, and we talked about doing a talk at KubeCon together, wasn't OTel related or open source. I wasn't OTel or observability related, but then later he was ...
(01:02:25):
I asked if I could join, if we could do some talks together, we start submitting more. I missed submitting to a conference, but then he asked if I wanted to join a talk that he was already doing that was on observability. So then I started doing talks with him on observability and then it started becoming observability and OTel specifically. So started doing that a bunch, which got me here. And I haven't specifically worked on things like actual contributions to OTel itself. Those are things I want to start doing, but have been focusing on creating talks or blogs or live streams and things like that. So those are also ways that people can contribute as well too, even if they don't feel like they submitted a poll request to the GitHub repo, whether it's docs or code or whatnot. And then I guess other Tiffany.
Speaker 4 (01:03:14):
Yes, I can be the other. I know we're almost at time, so I'll keep it short. I was a librarian in a previous life and I wanted to change careers to technical writing. So I was looking for ways to get experience so I could land my first job and I met the head of documentation for all of CNCF at a technical writing conference and we had a chat and he recommended that I check out some of the projects because they all need documentation help and I landed on OpenTelemetry. I lurked for a good long while in the community before I actually did anything. I highly recommend doing that. If you're nervous about getting started in a big community like that, just hang out and see how people talk, see what they care about, see what they're like. And then I started making my first contributions.
(01:04:11):
Some of them were original contributions, but mostly I was reviewing and copy editing other people's PRs and so I never stopped. So still doing it today.
Speaker 1 (01:04:29):
Awesome. Thank you everyone for being on here. I feel like this is a momentous occasion in the history of OTel and I was super happy to have you all here and talk about it and explain what it took to get here. It feels like a huge community effort.
Speaker 5 (01:04:48):
For sure. Thanks very
Speaker 4 (01:04:49):
Much.
Speaker 6 (01:04:51):
And for everyone who is listening and help contributed that we got to this place, thank you for everybody also that a lot of people are part of it. So thank you all and congratulations all.
Speaker 1 (01:05:04):
Yeah. All right. Well, graduation isn't the finish line so we can all contribute into what OpenTelemetry is going to be in the future. Thank you everyone for watching. Thank you for joining if you're watching this after the fact and you would like to ask any questions, still leave it in the comments and I'll make sure that it gets answered. Otherwise, join us for the next community call in a month or so. Thanks everyone. Also,
Speaker 2 (01:05:30):
There is Slack for the community.com and there's an OpenTelemetry channel there as well.
Speaker 1 (01:05:39):
Awesome. Thanks everyone. Have a good rest of your week. Oh, sorry, Tiffany. Sorry,
Speaker 2 (01:05:43):
I was saying there's also a OTel channel in the CNCF, not just us. Sorry, bye. It's