kOH 44 - Private load zones for load testing

# [[kOH 44 - Private load zones for load testing]] %% Related: - %% # Private load zones for load testing - k6 Office Hours 44 ## The video <iframe width="560" height="315" src="https://www.youtube.com/embed/sqKc95zdXyI" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe> ## Timestamps 0:00 Intro to Samuel and the k6 backend team 10:23 Issues in scaling up the k6 backend 16:50 What metrics do we capture on k6 Cloud? 26:19 Upcoming features that the backend team is working on 33:39 What is a private load zone? 37:41 Demo: How to use a private load zone in k6 Cloud 41:18 On-premise load testing vs private load zone testing 50:05 Storage concerns on k6 Cloud 52:19 How do we start load generators so quickly on k6 Cloud? ## Transcript - Hello everyone, and welcome to another k6 Office Hours. I am Nicole van der Hoeven as usual, and today I have two people with me. One of them you may have, you may remember from a previous k6 office hours, and that's Pawel Suwala. Pawel, how are you? Why are you even here? - Yeah, that's a good question. Why am I here? Since we have Samuel, Samuel should be doing all of the talking, right? I don't know. I guess I will help and answer some questions that will come up, but who I am, I'm still the CTO of k6 and I'm still doing many, many different things. Project management, planning, organizational stuff. I do some coding from time to time when I get some time. Yeah, I'm meeting customers, doing some, all kinds of things, whatever shows up during the week, I'm the one that is picking up things that don't have the responsibilities for others, I guess. - And welcome to the stream, Samuel Regandell. Am I pronouncing that right? Probably should have confirmed with you beforehand. - It's a Swedish thing. So REH-GEHN-DEHL is actually the-- - Regandell. - Yeah, so it's actually the emphasis on the second one. Yeah, so I'm Samuel. I'm Swedish and I'm sitting here in . I'm one of the team leads for the backend, and yeah, so that's basically it. I've been at the company for five years now and has seen a lot of changes go of course, over time here. - Five years, wow. How long have you been here, Pawel? - I think in a couple of months it will be four years for me. And I feel like Samuel has been here for longer than five years probably. - No, maybe, I don't think, no, I think it's five. Actually, I used to walk around and tell people that it had been at least five years, like a year ago, and then I actually counted and I realized that's actually over exaggerating. - That's probably why. - Yeah, yeah, so I was my myself confused by this. - So I'm the newest one here. I've only been at k6 for a year and three months or so, but I'm glad to have you both here. Samuel, why don't you tell us what you do for k6 on a daily basis. - So I'm leading one of the two teams of backend development, which actually is a pretty recent development. We used to be only one team, which I had led, but we are now split into two because, yeah, we're getting to be more people. So I do the whole people management aspects of the team, but also the day to day operations of planning, we plan for projects. I also do a lot of development myself as well. So a lot of backend work and planning going into it. Yeah, and day to day operations. - What is your favorite part of the job. - For me, I think the favorite part of the job is the creativity of it. I have always seen coding as a creative endeavor and I do a lot of other sort of creative projects and stuff in my free time. So development and programming is just one other aspect of, aspect of that, I think. So yeah, the ability to, I mean, coding is sort of a mixture of creative and technology, right, and you have to follow certain practices, et cetera, but it's not that different from any sort of art form in the sense that you need to be good at the technology, you need to be at the technical skills to do it. But there is still, I feel room for expression and room for design in your own design and your own thinking in it. And I think this is a good thing and something I really appreciate about it. And then of course, I'm also a manager, so I'm also have a chance to talk to people and get to know people and, you know, handle that aspect of it. So it's the social aspect of it is of course also useful, especially in these days when everyone is working from home, well, as you see, I'm also working from homes. - Samuel has another secret job on the k6 team. He is the organizer of the monthly k6 board game night. - It's true, that is true. But I mean, I have a set of board games behind me here, but I should tell the audience that I cannot beat Nicole's . - We play virtually. Anyway, honestly, I haven't played many of my board, physical board games in a while. No, but Pawel was it, was it your decision separate the one backend team into two and why was that decision made? - So I wouldn't say it was my decision because the, the decisions that we make in k6 are always around consensus and I cannot make decisions like that on my own and dictate how the team will be organized. So it's always a discussion of how do we organize ourselves when we grow and we, we seek input from many team members and reach consensus, and then we decide something. So I cannot really take responsibility for this decision, but we, I can, I can tell you why we decided to do it. As you know, and as all of us know, we, we are growing quite quickly and we are hiring more and more people. And at the moment, I think in the backend purely engineers, I think we have 11 and we are planning to have 14 in, in a short order. So having 14 people in, in one group is, is a little to many. So primary reason is that we wouldn't be very efficient and we decided to speed it for that reason, but there are are many reasons, right? So Samuel can pitch in here. - iI's also, yeah, so there are also the practical reasons of having, I mean, I, we try to have, I try to talk to everyone every week to have at least half an hour to an hour meeting with everyone face to face, or at least camera to camera and, and just practically having 10 other people to talk to in a week would probably, I would only do this. So for that perspective, if I'm also, you know, wanting to do any coding work or any other work that I would not work, I think it would also be tough to keep up quality, personally, at least for me to do that, if I'm just, going from meeting to meeting and try to keep up with, with everything. So splitting it was a practical matter for, and having, as Pawel mentioned, having a smaller group, even if you have a meeting within the smaller team is much easier to get, to be able to actually go more deeper into problems people have, even if you discuss in a, if you discuss in a smaller group than having, you know, all 11 people all trying to speak. You can't just fit that in an hour. That takes much too long. So, yeah, but we also try to, to have every week sort of group meet between all the members of the backend, so both teams, so all 11 people, but we just, you know, we have in order to keep a social relationship between each other and keep a cohesion between the backends, because we also don't want to have, you know, that one team is a isolated thing from the other one, because I mean, we are all working on the same code and we are all collaborating on, on things. So yeah. - Yeah, that, that totally makes sense. I did post a link in the chat to the positions that we're looking for right now. It looks like we only have one up there at this moment, but Pawel you already alluded to the fact that we are still looking for many positions or we will be soon. And some of them are going to be on the backend team or one of the two backend teams, right. - I think on both of the backend teams will have openings quite soon, I would say in two, three months. We are at the moment not having those openings because we have hired quite many new people and we don't want to go too fast to just create chaos. We have, how many, how many new are we onboarding right now? - Two, three, I think. Well, depends on how you mean onboarding. We have four people started that started basically this year. So they are in different stages of onboarding, I think five total since last October or last December last November, maybe. And so we, yeah, we feel, we don't want to, to grow that too fast in that regard, you have to make people, allow people to acclimatize to, to the, to the environment and also to get them, you know, going and helping them out and so on. So yeah, we don't feel, we feel that we are, we don't want to expand faster than we are able to properly absorb the new newcomers. - But I think that we will open those positions around May, maybe this year, something like that. So if anyone is interested, keep an eye open on, on open positions and apply. - Yep. Yeah. - So 11 people and three more positions soon in the next few months. Why do we need such a big backend team? What are some unique issues that we we might have for k6 and k6 cloud really? - So, yeah. So the backend of what, what I call backend here that's the k6 cloud offering, right? So it has several, several responsibilities apart from the south service basics like you being able to set up organization, you know, get subscriptions, et cetera, all this meta stuff. What we also do is to ingest the data from either from your locally running k6, or from you telling us to run k6 for you in the cloud. And this is stuff that is it be, we absorb a lot of data over time and we are, we have a big system that, that requires, of course, and we are also trying to gradually upgrade this and, and update it to bigger to the standards, et cetera. I mean, we have created the system back in a time when, when to some extent, many standards around metrics were not really available yet. So internally we are shifting to these status, but even existing tooling, doesn't quite do what we need yet. And that's quite, not quite catching up to what we can do in the cloud, but this takes a lot of resources of human resources of, of course, to, to be able to not only upgrade our current system, but also to maintain what we have because we are growing and we are getting more and more people using the system, which means more and more metrics going through it more and more tests running in parallel. And there are, there are, so the optimization and the improvement in performance of this is ongoing thing that we need to be careful about. I mean, Pawel remembers how, how the limits we, we had on the system only a few years back and, and how it has improved dramatically since then. But yeah, so there's a lot of things to do, of course, still here so. - Yeah, and I would add to this, that we are getting more and more customers and not, not only that, but I think customers are building more complex tests, tests that have many custom metrics, many modules using extensions, using JavaScript libraries and all of that needs to work flawlessly in the cloud because customers are using us in the CI environments. So if k6 cloud test fails, it creates disruption in the entire pipeline or built pipeline for the customer. So we are trying to be as reliable as possible and it, it doesn't happen on its own. So there is a large engineering effort to, to make this a them reliable, because of course, k6 customers and k6 users can write arbitrary JavaScript and that JavaScript can do almost anything. And it needs to run well on our servers so that there is always work to be done with all of the inventive scripts that customers are writing. - Yeah. And actually, I think that there are a lot of reasons, right, to use k6 cloud. And some of them are, you know, the ease of use and, and the nice interface, having that as a, as an easy way to walk people who maybe have never have never started load test before into the whole world of load testing. But personally, the thing that I would find the most value out of a load testing as a service kind of platform like k6 cloud, is the fact that it handles all the results for me, cuz I always think it's such a shame that there, we have a tendency to focus a lot on the scripting of a load test test when actually a lot of the value that you would get from running a load test or performance testing in general is in the results. And it, the load tests just generate so much data, right? This is part of the reason why it made sense for Grafana to acquire us, but having seen, having had, having had nothing and having to figure out how to, you know, aggregate well first collate the, the data from several low generators and making of it and aggregating somehow because you can't, it's very difficult to, to maintain all of the raw data for all time. that's one of the selling points of k6 cloud that all of this is just handled for us. Right? You don't even have to worry about how it's done and it's the backend team that is primarily responsible for, for that part of it. - Yeah. As you, as I say, the aggregation of data from, you know, a huge number potentially of separate low generators, all coming in all these time series need to be aggregated in a way that makes sense. But cause you, you not only have, have a large amount of data, you also have a large amount of ranges of values. So you have the, you know, rate coming in cases, measuring some, some, some rate from, or request rate response, time value from some few milliseconds up to several seconds or even longer that several orders of magnitude of scale. So a averaging over that, for example, would not work that that would just be adding a small number to a big number kind of thing. Right. So you would end up in problems with that. So yes. So a lot of it's a lot of complexity to, to be able to handle this data in, in real time. Well, as close to real time as we can get it, of course there always improvements we can do here, but yeah. Cause the user won quickly as possible, obviously. - Yeah. So can you, can you walk us through what actually happens when, for example, I start a load test on k6 cloud, let's just say one generator, a handful of users. How, how, what do we, what metrics do we capture from that? Is it everything from the built in k6 metrics plus custom metrics, is, is it raw? Do we process them on the flow and only save the aggregated values? How does it work? - so what happens is that you have say, so example of one of one low generator, you are your case itself is first aggregating a bit. And then so it it's doing cause it has. So each, each low generator aggregating and sending batches of data to the cloud, I think every six seconds of data, we receive that. And it's actually going via, regardless if you're running cloud or if you're running, you know, k6 dash out and just push the data to us, we are using the same input. It's just that, you know, you are running cases yourself or we are running it in the cloud, the data counts in, we unpack it and deal it in serialized in, in some ways. And then we are going to queue, queue it up for storage into the database. And this database is our custom is basically Postgres database that we use with our custom extension called times scale that, or is optimized for series data storage. And so the data, but we don't put the, in the individual data from k6 into the database, E because I mean it, for one, for one single low generator, it might, that may be actually reason okay. To do potentially, but for big tests, when you have, it could be a large number of low generators, all dumping data into the, into the same pipeline from then, you're going to be having within every time slot, you're going to be having a lot of values of, for every metric for that particular time slot. And as I said before, they're all going it to be spread over a large range of, of values potentially because of, of the way networks work and, you know, can have a very, it's a very noisy data. So we do something called dynamic range, use, apply something called a dynamic range algorithm on, on, on this we, where we basically distribute the data across a sort of log scale. it is a log two base scale where we, and this is what we then store into the, into the database. An advantage of this approach is that we can do, we can be sure of the, of the error we have here. Because if you, if you just were to take the, the plane average or some sort of weighted average, maybe over, over each time slot, you would, you don't know if you're throwing away spikes in the data because the average will smooth it up, smooth it out. And more importantly would not know the error, the maximum error you would, you would be getting. But when you, when you do something like a dynamic range or divided into buckets, basically, you know, how many buckets you're placing. So you, you have a, you have a way of telling, you know, we many buckets, so we have, we won't have a worse error than 1%, for example. And that also means that, so the data and, and it also cuts down dramatically on the amount of, of data that you are storing because you, it doesn't, it no longer matters how, how much data is actually coming in, because you're aggregating it into this bucket. You're going to have this amount of buckets, but there're going to be a different amount of content in each bucket. So storage wise, it has, it helped us a lot. And I'm, I'm saying this because we used to store all the data they come in and it just ate up the database very, very quickly as, as we grew more popular, plus we didn't know what the error was. We would develop, you know, we could potentially have like 50% of, from the real values when you compare it to, to what we store and the averages, et cetera. So, yeah, so at that point, we, we store this into the databases. It's all custom custom written routines today. and then we do, once we have that, the database, then we are going to run other processes on it. Like, you know, precalculate in the 95 percentile or all, all sorts of different, useful things that the user will want will need, but it's all operating on these buckets or on these dynamic range structures. Yeah. And then of course we do apply in, in various ways, the thresholds that are involved, the, we also do small auto analysis. What we call it, where we, where we actually analyze the, the result and try to figure out, give, give some reasonable advice. What, what the heck are you seeing in, in this, in this data, which is also another thing that, that adds value of course, to the, to the customer. So this is all done at various stages of the, of the, of the test run, partly, I mean, threshold needs to be done on the fly as the, as the, as it happens as much as possible in any way. And things like analysis of course, is done generally on the whole data set. So you can, so you can get a, get a feeling for it. This is actually one of the things that may be. So the fact that there are actually batches of data coming in and being operated on in a time if users ever noticed that they, it takes a little while before a threshold is actually being respected in the, even though they visually can see the, the, the threshold being crossed that that's because of this, because it takes a little while just to process things and have it go through the whole system. So even though you have, you know, you can visually see something happening. It still takes a while before that the, the whole threshold processor is actually work, finished working on it and realizes that, okay, we, across the T time stop the test or whatever you need, you wanted to do. - You know, I didn't even know until now that, that we didn't take the average. We take the dynamic range, which is really so much better because that, that is a problem. Like, I mean, I know, I know Sam and Pawel, you know this, but just for, just to explain for others who may not, if you have maybe 10 values, 10 data points, and nine of them return a response time of one second, but then the 10th returns, 10 seconds you would have in, in the average of those would be 1.9 seconds. And you might think, oh, it's less than two seconds. That's okay. No, the 10 seconds is a problem. If you just take the average you're, you're, smoothening it out, which is what, what Samuel was saying. You're missing that valuable information on the spike, because maybe that's not a problem for the load that you're running at now, but maybe this is something that you'll see more when you're running thousands of users, or maybe you'll just see it in production and wish you had been able to see that in your test results. So I think it's fascinating that, that we're not taking averages. That's, that's really awesome. - It's also computational thing, which as I mentioned, because you have such a wide range of number of values, multiple orders of magnitude, as I said, from like milliseconds all the way to multiple seconds. So at that, at that point, yeah. Yes. Numerically adding in, in the computers, doesn't like to add a really, really small number to a really, really big one. So yeah, you're gonna be losing precision just in the, in the, just by that, by doing that simply. - So right now on k6 cloud, when someone runs it on k6 cloud, so not using the dash out cloud, but actually running it on K cloud, where does that actually run? - So it runs on in Amazon. So we run, we are running our own system is running one of the load zone of Amazon or one of the centers of Amazon. But as you know, you can, you can specify in your script that the, you know, which load zone you want to use, like in somewhere in the us. So you want to spread it out depending on what kind of traffic you are wanting to emulate. So our controlling software, so to speak is running in one, in one computing center. But that in turn, we also speed up instance is running the k6 agents, which we call them. Okay. k6 instances all around, all around the world, in, in Amazon network, on cloud. - So Pawel, what are some of the concerns for, for the backend, for the next few cycles? I know you have something specific in mind for that as well. - Yeah. Yeah. So we, we have been growing so fast and we have those customers that are running very complex tests. so we decided that in the next cycle and we work in cycles of eight weeks in the next cycle, we will focus only on reliability in k6 cloud. So we'll not be adding any new features at all. We'll just be working on making sure that the test, the, the system is stable. The observability of our components is topnotch. We'll be making sure that our deployments are not disruptive to customers. We'll be setting internal processes within the company on how to deal with outages and disruptions. So incident management processes, we'll also establish more detailed us for handling level three support, so a support that our CS colleagues cannot do themselves, but they land on developers' desk and many, many processes of just making k6 more reliable, maybe respond quicker to customers, maybe evaluate thresholds a little quicker than, than at the moment. Yeah. That's the reliability is the, the team of the next cycle. - Yeah. And that's of course, sorry. And that, that is of course falling a lot on the back end side. Cause yeah. Everything is connected to the background one way or another. Yeah. - Absolutely. And I would say that it's, it's easy to think that when you run k6 cloud test dot JS, it is such a simple thing, but really the complexity of executing a test in, in the cloud is massive. So you can just execute one comment as a, as a customer or a user, but in the backend, we might be speeding up 100 servers to handle the traffic, to generate it and handle it and store it and process it. So it's immensely complex system, but it looks also super simple from the user per exactly as we designed it. Right. - That is the whole point. Yeah. - Yeah. I mean, I think that's, that's how, you know, Samuel and the rest of the team are doing their job really well, because it seems so easy, but it's definitely more complex underneath that, but we haven't even, we've talked about metrics, but we haven't talked about the actual instances that are being spun up. How do you decide if I, I run a hundred users versus 10,000 users, how do you know how many instances of k6 you need to spin up? Or how many instances of, of the AWS machines you need to provision? How does that all work? - Yeah, this is, this is a complex problem. And we actually run a special service, which is actually derived directly from the configuration, partly at least directly derived from the configuration code of KC itself, just to be compatible. You know, at some point we actually had our own validation mechanisms, et cetera, in Python in the backend, but yeah, it's better to be compatible and make sure that, that it's always up to speed with what you, what you use in the, on the CLI or on your own. And what we do is that yeah, we have algorithms to, to calculate and figure out based on the load on distribution, new shoes, we produce a mapping pretty much where, where it's okay. we are going, we need, we are going to need, and we are going to be this, this number of instances for these, for example. And I should say there's also possible, but it's also possibility for individual customers in some way, if they talk to us to tweak this a little bit, how many instances should be run on one? How many cases, instances should run on one server, for example, because this, this matters, of course we have an average, but we have a, it, depending on the, on the type of test, you run this club, you may need more memory or stuff like that. Yeah. But by average, anyway, this is what, what happens. And then we have a mixture of try to reuse existing instances where possible. So we try to be clever about it. If we have, you know, already a bunch of instances, just idling in, in given load zone, we use, we, we use those. Otherwise we spin up new ones in, in, and basically sit and wait until they have properly responded. And then we have, yeah. I don't know if I, yeah. So basically what happens is that k6 has a little web server on it. It has an API you can call. so in, in principle, when you run k6, you can, you know, you can tell it, like, now move up to use 10 virtual users. And now you can use 20. And what we have is a custom little program called the call cost k6 agent that is mimicking you the user, because each k6 say you have 20 instances, but up all across the world and all of them should represent a load test of say a hundred thousand virtual users or something. And each individual k6 doesn't know anything about other ones, they're all running in parallel. And so the k6 instance are just told, or k6 agent, I should say, I is just told to tell k6 to up the number, according to this plan, which we have split across all of the instances. And then you have, and as that happens together, they all, you know, create the test around that you ask for. So, yeah, so that, that's sort of the, the basic underpinning, but it is quite important. I think to note that we are using exactly the normal k6 in the cloud, like the same open source tool that everyone has access to. We don't have a secret source, you know, or, you know, close version or separate branch one. We may use the bleeding edge of course, but that is the same bleeding edge, which will be available to everyone. So there's, there's nothing proprietary about the K. - You mean, we don't save the super secret special features for the paid version. What. - No. Using the same k6, but it's, yeah, we may be a little bit faster maybe to get it or to make use of it maybe, but not that much more. - So Pawel, tell me what the, I guess the motivation was for adding more execution options instead of just running users, running it on our cloud, why, what did they want and why did they want to do it that way? - Right. Yeah. so we are currently, we have developed the private load zone feature, and I think that's what you're asking Nicole about. - Yeah. In a very leading subtle way. I thought. - So some customers that came to us that they would, they're not able to use k6 cloud because k6 cloud runs the K generators within our own infrastructure. And therefore they cannot reach endpoints or services that are private, that are not exposed to the internet. And that's, that's been a major blocker for, for some users in adopting cloud. They have been using k6 locally because of course locally enrich the private services, but they couldn't use cloud. So, so naturally we decided to implement a feature that is called private load zone. So it's a load zone that is deployed within the infrastructure of the customer. And it's only available to that one customer and k6 cloud able to connect to that infrastructure and spin up instances of k6, that then are able to reach all of the private services that the customer allowed it to. Of course it can be firewall still, but customer is able to reach those internal services. So that's one reason. The other reason is security. Of course, maybe that's related customers don't want to expose private services to the internet. And therefore security is improved. If we are able to run within the infrastructure of the customer. - Sam, could you, could you show us how, how that works or how it can be used by users? It's, it's already live, right? - It is already live. It requires of course, users to actually talk to us at, at, and set something up to do it because it, it is done in the, in their AWS cloud. And the, so the process would be for them basically to set up, give us, give us particular permissions, basically, to, to start spin up instances in the, are in their private cloud. That's something I was thinking to say. So, yeah. So the, otherwise the, how it works on our end, what the changes we have had to do on our end is that we, in the past, we only used our own cloud. So we knew we had access to it. Now we need to start pass passing around, you know, the, the credentials that we have gotten from the, from the user. And it also means that when we spin up load generator, we need to set up the private case, et cetera, for are talking from outside of the system. So, cause they should not have access to the private keys that we use for other instances, et cetera. So we need to be able to call home and, and sync up with the, in a secure way. So that there's a secure communication between this new sort of external cloud that is being X. So that that's the, that's the additional complexity on our end. But if I I'm sharing the screen here now, if you can. - Maybe. Okay, sure. - So. - Could you maybe zoom in a little bit? - Sure. That better. So this is an example from the, this is an example from the app where I have created a test run with support for private load zone. And it's really nothing. Not, not much to it, basically this is the something you're probably are familiar with. It. You haven't made a k6 test run in the past, you set up distribution with different load zone. So an interesting thing here is that you can actually mix and match a private load zone. So up here is this custom load zone. This is a testing one we have, we have now, but if you had your own notes on here, it would be something like, you know, your company co on Amazon co on EU, for example. So this is just the name, internal name we would agree on for you to use for your load zone, but you can also mix it with existing ones. These are the existing Amazon sort of public load zone that anyone can use in Paris in us here. And that's basically that. so if we, if you run this, it will, it will, might take a little while before it gets to the point of, of getting the data because we have, we, we will probably not be able to, to reuse an existing load zone in this case, but have to spin up a new one. So, well, it was pretty quick actually this time. So, and once we get, yeah, so this is just the normal, you can see, we actually have this little simple up here, which means it's a private one, Dublin for 3% is running this private load zone. And you can see, and the, in the tool tip here is a, is a private one, but you also get data from public one, if you want to, if you want to. So, and that, that's basically from the, from your perspective, the, the using this is very straightforward. Once you have the load on set up with us, you yeah. And then you get the data. This is a very simple test, but. - Wow. And so are there any differences in with regards to the metrics? Are they still being aggregated the same from the private ones as with the public ones? - No. There is no difference. It's all, it's all operating the same way, just that you are now running the, the agent on your cloud rather than there's no other, other difference in, in how it's presented or how the it's aggregated or, or anything. This is why I'm showing that you can mix them here because they, yeah, they all, they all come in and all it all look looks the same. Once we, as we receive them, we don't care if it's how, where it's coming from. So we ingest it and treat them in the same way. So yeah, that, that's it for the, for the demos, all there risk to it, actually. - So are there any, any concerns with this when, like, for example, the load generator, at least for that particular private load zone is within the infrastructure of the company, right? so it could be like something on premises or, or something, or is it just AWS cloud only? - This is AWS only at this time. So this is, this is the only, only provider we, we support at this time. - However, I think somebody else on the team is Chenko is going to be working on, on this feature quite soon, right. To also be able to, to run distributed tests within the on-premise infrastructure. - Right? Yeah. Yeah. So that's another feature that is related to, to private loads zones. It is distributed execution of k6 open source, which we already have with k6 operator. So you, you can have a distributed load test without using our own cloud, but that distributed execution couldn't output data to our cloud. so that was the limit. Some users wanted to use distributed execution within their own infrastructure, but wanted to output metrics to our cloud. Because as you, as you said, Nicole, it is quite a hassle to manage all of the data and store results and visualize it and analyze it. And this, the big selling point of k6 cloud. So that's another feature that is coming up soon. And we hope to release it within some days or weeks. We are currently stress testing it and, and verifying that it works as expected. But yeah, it already works within our infrastructure. We been testing it for the last few weeks and that's, that's hugely important feature to some customers because when you use private load zone today with, with k6, we still have access to your test script. So we have, we can view it on, on the k6 cloud side, but with the distributed execution, with k6 operator, we wouldn't get access to any data of the customer except for metrics and metrics don't have any, typically don't have any confidential information. So from security perspective, that is going to be a, a great feature for many of our users. - I guess we would get the metrics, but also the script, right. If it's being fired off from, from k6 cloud. - Well, for, with the k6 operator, no, we wouldn't. - Oh, yes. Sorry with the private load zones. - Yes, yes. With private load, no. - With the private load zone, we also get the, the script because we still needed to actually spin up. So private load something there, there, the actually running of the, of the test happens on the, on the users' infrastructure, but we still have, you can still run the test, for example, et cetera, the cloud for that, we need to have the script. So that's a different thing from the, from the operator that pav is talking about where the it's basically our k6 ingest. So like, it is, it's almost like it is like doing k6 dash dashed out, but you have many more, you can just do this for many more case. It's pretty much. - Yeah. - So how would a customer get started with this? Talk to us, I guess talk to support is, is the first thing that [email protected]. And then that is basically, I, I think we discuss in a previous meeting that support will be eventually handling it, but at the beginning, probably because this is a very new feature, even to us, it will probably be some, some sort of collaboration with the backend team as well. Right. - It's actually set up to, to not require backend work at all at this point. So CS team, we will help CS team. Of course, if they have questions about it, but it's not, it's not, it's not required to do any code chain or anything for, for this to work. It just should. - Be. Yeah. I was thinking that they might have some questions about customers who don't know how to set the up in a, in AWS, cuz you know, we need very specific permissions. We have to k6 has to, k6 cloud has to be able to start instances and end them. And also there also has to be a way for them to, for those instances on their private load zone, to be able to communicate back to k6 cloud for, for the metrics. Right. So, okay. So they have to do this configuration and set up their AWS account and then what. - And at that point, once that is set up well and they have the description that allows them to use it. So there, there needs to be. There needs to be a, there is a manual step for, from CS to allow this, allow the user to use it. But after that, then can, then they can just specify the, the load zone in their, in their script as I showed before. And it will just work. - So once this is set up, will it also come up in test builder or just in the script? - You know how in, in test builder you have the load zones and you can normally select which, which one you want to. Okay, so it'll be script. - So no, I think the, no, they will, that will not show up in the test builder. You will need to specify that manually at this time. - Okay. Well I guess the work around for that is to do it in test builder with whatever load zone, convert it to the script and then just change the load zone there to the private load zone. And like with, with of the one that you showed, it was like a single line that, that changed that just basically identified the private load zone. Right? - Yeah. - Exactly. But I, I would say that if this doesn't work yet with test builder, this will work very soon with test builder because of course it should work. - Yeah, no, this is, this is really, really cool. I think you already mentioned some things about some reasons why this is so useful, the security part and also just of the related reason as well that there are many applications that are behind a firewall or not publicly accessible. So I think what are the challenges though? Like why haven't we done this already? What, why, why did was this piece of work difficult? - Right. It looks so simple. - I know, right. - One line. I mean. - Is Samuel just slacking off here, like what's going on. - Yep. That that's it. No, but the it's it's well, it, of course it's all come down to priorities and, and people actually actively asking for this or really pushing for this, for this requirement we have, you're starting to get customers that are really, you know, more and more interested in this kind of techno of thing. But yeah, it, it does require a lot of planning actually and thinking to figure out how to do this in the, on the, on the cloud side. So it is, it is more complex than, than, but we, we advantage of having a pretty modeler system in a sense that we can, we can apply to. Yeah, that's it that we can actually plug in and use the same. Like you asked the question if, if there's a difference between the metrics coming in from the private load. So another load, so, and there is not. So that does make it easier for us to do this kind of expansion because we everything's going through the same pipeline and we can plug in different, different things there. So our design of the, of the backend system helps us here, but nevertheless, yeah, to figure out how, how to actually do this and also in a way that is safe for other customers so that we don't just share, you know, access to instances will nilly, but you need to actually track and maintain secure credentials for each of these load zone separately, et cetera. It takes, it takes a lot of planning to, to figure out how to do this, you know, in a, in a good way without being, you know, too patchy or happy as well. - Yeah. So you've, you've alluded to the fact that it is that it's not just the aggregation of the results, that is a problem or a, a concern to be addressed, but also the storage. Are you able to share how big the database is for all k6 cloud results? - I looked it up. Yeah. Just before the call, because I know new Nicole that you would ask. I looked up, I was quite surprised by the, by the number, because when I looked about six weeks ago or eight weeks ago, the, the database size was 30 terabytes or 32 terabytes, I believe. And I looked today and it was only 11. So it is quite a big difference because we do delete data after the retention period ended. So I think by default, most subscription, we, we store data for 30 days, I think by the default. Yeah. By the default. But some, some customers have one year or two years data attention. So it must have been that one of the bigger data sets have been dropped recently because it expired. But yeah, we range, I, I remember time when we had more than 50 terabytes, it, it goes up and down. - I wonder if it's also like seasonal what happened, this time six months ago or, or something, you know, just before Christmas, perhaps. And so maybe lots of companies were, were testing their Christmas sales of e-commerce and, and that kind of thing. Yeah. Wow. 11 terabytes even, yeah. Even 11 is, is already a lot. Like that's already the size of my, my video collection here for editing. So when you think that this is all text, that is a lot. - It's mostly actually numbers. It's integers and floats, so it's a lot of them. - Wow. Okay. I also wanted to mention, by the way, Samuel, that when you were showing the, the instances starting on k6 cloud, that was one of my first experiences with k6, before I even applied for a job. This was when I was, you know, doing some spy stuff. No, I was just trying it out because I, I had never used it before, but I was so impressed. That was faster that the starting and provisioning and instance was so fast and it, and it was so impressive that that was one of the first things I wanted to know once I joined k6, like how is it that we are faster than Amazon? Because when you start an instance on Amazon, it's way slower than that. Even if it's a single instance. And I did get the answer that we REU use those, those low generators. So how, how does that work? Just for my own curiosity, how do we, have you looked and seen if, if there are any like memory leaks or, or something, is there a downside to reusing already existing instances? - So, yeah, so we are, yeah, we are, we are, as I said, we are reusing it if we have, if they have not been idle for too long, so we will close down instances if they are, you know, sitting around with, with no one picking them up. And I mean, there, there are risks of course, to having, if there are issues and we, we do need to clean up instances that, that, you know, die for various reasons. And that does not necessarily have to be our fault. It can be, you know, something in the network or, or, or in infrastructure, right. That, that fails. So we, you need to be careful to not leave, you know, leftover some, these, some instances behind, but overall we, we track and have a, have a list of instances that are, that are currently available. And when, when it has been used to run a test, it will go into state where it, you know, okay, I'm ready, I'm ready to do something else if you want me. And then we, and if it doesn't wait for too long in that state, we will reuse it. So, but yeah, it varies. I mean, if you ha, if you are running a big test, it is very likely that you are going to be spinning up at least some of some new instance. And then, then, then yeah, it's gonna be taking a bit longer too, to get all the, all the resources available. And if you're unlucky, I guess you could also end up even as for a small test run, you, you end up with with that. But most of the time you, you, you, you can at least for a small test run, you will, we are able to get the reuse one, and then you can spin up pretty quickly. - I would add that this is a metric that we are tracking quite closely. How long does it take for us to spin up a test for, for the user? And on average, at the moment, it's about 26 seconds that between you clicking out on to run a test and us receiving the first results for the, for the test. So this is pretty good, but we hope to do better as well. And as, as you know, Nicole, we, we care a lot about developer experience and it's super important that tests start quickly. And we know the, we are probably faster than any other sales competitor. I, I don't know. So maybe not, but we still want to be much faster. So we are pulling those instances and keeping them up and running, and maybe that's not economically efficient, but it is definitely good for developer experience. - There are also other things that could be that. I mean, we can't really affect the spin up time of our AWS instance, but we can, we also need to do other processes before your test can start, right? We need to validate your description, make, check that the test is, is okay and so on. And these things can be optimized and be, and be done in a way that is, I mean, we, we are doing it today in a way that is very, that is very recoverable from, so to speak so that you, if there is an issue in, in our, our system, we can retry in a safe way and, and you will usually not get stuck or very, at least should be UN very unlikely for you get to get stuck at some state. But nevertheless, yeah, this does add time to, to this 26 seconds that power is talking about so that there are improvements to be done on the back side there as well. - So Nicole, a question to you because you know better than we do, how long does it take for, for other SAAS players to start up a test. - Having worked for one previously, the standard response was that you should wait at, at up to five minutes. Anything between one to five minutes is normal. And after that, if you still haven't had your instances start then contact support. So yeah, 26 seconds, did you say that is fantastic? I've, I've experienced something that was similarly fast. The problem with that though, was that particular SaaS provider. They, how they, they were able to do that because it was shared infrastructure. So that was a huge problem because you would run a test and you don't know what the, the other people that are running on the same low generator are doing, you know, that was that's completely different and not worth the savings and time my opinion. But that was the way that, that SAS platform, I guess, was able to, to justify leaving that low generator up and improve. And that was how that they were able to improve the startup time. But like this way is way better because it's still private. Nothing else is run for the script that you are using on that low generator, at least so it's way better. Yeah. - Yeah. I mean, it's not uncommon for, I mean, one of, one of the issues that can happen, I mean, it's also because the user can customize the script as, as they want. Right. So it's one of the things that new, new people's mistake is to, to not put asleep or something in the, in the script. And then you are basically running an infinite loop, or you could be running very, very fast. You all the CPU it's a useless test, but it would, kill any other instances of things running if they were shared on the same machine. - And it's a very common mistake too, to load test your load generator instead of the application accidentally. - Exactly. So yeah, it would be very bad to have them shared. So you couldn't really trust, you know, we be very under terministic for you to know what, when your test fails or not. Right. In that case. - An interesting aspect of this is I wonder how the perception of performance changes because of that awesome animation that we have on k6 cloud that I love so much, it looks like things are really happening. And it's a really nice visual. It's kind of like how the same experience with a progress bar will be perceived as faster, even if it's actually the same. - That that's so funny because we, the, the perception of tests starting quicker is actually, I think, negatively impacted by this because we, around this animation on this end where we start showing results. Yeah. So the test probably has started, but the animation is still running. - Yeah. If you, if you, if, actually, if you saw the demo, I did it actually switches to running status several a good while before it switches actually to the first data coming in. So the test is actually starting there. It just hasn't actually produced any results yet. So yeah, we just don't switch until there's actually data available. - Just another example of how humans can be irrational. - Yeah. - Wow. But anyway, that we're actually out of time. That was really interesting. I learned a lot about how, how the k6 cloud works. So I wanna thank both of you for, for coming. Do you have any parting words about, about the backend team or the future of, of this private load zones feature? - No. Well, apart from, from try it out and talk to us if you, if you want to use it and you know, if, if there is any issues normally blame front end. - I thought you were gonna say blame Pawel. - No, no. - It's always a front end bug. There is nothing. Oh. - Pawel, anything from you. - Come and join us if you want to work on features like this, come and work with us, I guess. - Yeah. - Yeah. That's the link over there. Thank you both for coming and for those who are watching, thank you for sticking around and have a great weekend. Everybody see you next week. - Bye. Bye.