Kubedim - Self-adaptive service degradation of microservices-based systems (lit)

Author: [[Kelvin Zhang]] - [site: kz.hn](https://kz.hn/); [github](https://github.com/kz) ## Sources This page contains my literature notes from three sources: ### The paper Zhang, K. (2021). _Kubedim - self adaptive service degradation of microservices-based_systems_. Imperial College London. Retrieved from https://iamkelv.in/assets/thesis.pdf . ### The video In the YouTube video below, Kelvin Zhang defends his thesis (above). <iframe width="560" height="315" src="https://www.youtube.com/embed/epDJnEytXvM" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe> ### Conversation with the author On August 13th, 2021, I invited Kelvin to talk to me and [Daniel González Lopes](https://www.gonzalezlopes.com/) about his thesis on [[k6 (tool)|k6]] Office Hours: <iframe width="560" height="315" src="https://www.youtube.com/embed/O1Evg9_EEmU" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe> ## Notes Highlights in [[Kubedim - self-adaptive service degradation of microservices-based systems]] ### Previous approaches - limited by cost of scaling, performance bottlenecks in code, and high response times at high load - orthogonal or uniform approach to reduce optional components when the application is under load ### [[Microservices]] Microservices-oriented architecture - versatile, but also difficult to scale because of complexity [[Circuit breaker pattern]] for microservices [^nerdfortech] - In microservices-oriented architecture, a component that retries calls to another component that is down may exacerbate the availability issue. - Instead, a circuit breaker pattern adds a proxy between the two components. When the number of failures exceeds a threshold, the circuit breaker "trips" and does not pass on requests until after a specified timeout period, so that availability does not worsen. - weaknesses: - indiscriminate as to the type of requests leads to critical requests being throttled too - fix: _brownout_: separate required and optional, and _then_ apply circuit breaker to optional - however, feature interaction analysis is required for this because it changes the request mix (more required/complex requests than less complex/optional) ![[Microservices.png]] [^richardson] - microservices are separate services that share a codebase and are deployed as a single application - advantages vs monolith - [[Microservices vs Monoliths]] - Monoliths are limited when it comes to scalability, because you have to scale the entire stack instead of being able to do it on a service-by-service basis - Each service is independently deployable and scalable - Microservices are usually deployed in lightweight containers - [[Containerization|Container]] [[Ops Tools#Container Orchestration Engines|orchestrators]] like [[Kubernetes]] are then used to manage the containers and coordinate deployment ![[Visualization-of-kubernetes-cluster.png]] [^zhang] The image above was part of Kelvin's thesis, adapted from and credited to the image below by Nginx: ![[visualization-of-kubernetes-cluster-original.png]] [^nginx] Kubernetes - pods - nodes - deployments - services - ingress controller - applies routing rules, typically for load balancing - configmaps - objects that declaratively set configuration values and specify how a container can access them ### Goals of project - Improve the use of [[Dimming|brownout]] in existing control theory approaches and make it easier to make good brownout decisions (taking into account the mix of requests) - Create a workflow that DevOps can use for businesses Contributions - brownout-enabled reverse proxy that can be integrated into Kubernetes, designed with developer usability in mind - two knowledge-based brownout strategies - user manual for Kubedim ### Availability techniques #### Control theory - optimizing the operation of a system or application by systematically holding variables constant and as close to the desired state as much as possible [^wikipedia] #### Techniques in cloud environments used to maintain a high quality of service - auto-scaling - self-healing: Container orchestrators (k8s) recreate pods if they detect failures or deviations from the configuration - load balancing - Even better if brownout-aware #### Circuit-breaker pattern and brownout - A retry storm can lead to an unintentional DoS attack from within, so a circuit breaker prevents that. A concern in dimming is that dimming requests of a certain kind will change the load distribution of the traffic, and it is difficult to determine how this will further affect the system under test. #### "Self-adaptive" software - achieved by creating a feedback loop that helps the software improve and adjust tactics - An _adaptation manager_ is responsible for handling the feedback loop. - Common feedback loop design: Monitor-Analyse-Plan-Execute (MAPE) Filieri et. al.'s method for implementing a control loop: 1. Identify target setpoint. 2. Identify knobs to turn to get an application to the setpoint. 3. Create a model: which knobs to turn to try to get to setpoint. 4. Design a controller (time-based, knowledge-based). 5. Implement controller and test system. ### Statistical analyses #### Least squares estimation Given a set of parameters, least squares estimation determines the optimal coefficients of those parameters so as to minimize errors in prediction. An example of the parameters given could be the number of replicas for a pod, amount of CPU and memory allocated to the node, etc. #### Auto-regressive modelling Auto-regressive modelling attempts to predict a future value based on patterns found in past values. For example, this type of modelling could be good for predicting CPU spikes, where future spikes are influenced by previous ones. ### Sample microservices applications These applications were created to be reasonably realistic examples of typical microservices-based architecture. They are made of several components, some in different languages. - [Sock Shop](https://github.com/microservices-demo/microservices-demo) - polyglot architecture - limitation: there are minor bugs, particularly in the checkout process - you can only check out with items in your cart lower than a value. - [TeaStore](https://github.com/DescartesResearch/TeaStore) - Docker and Kubernetes - particularly good for [benchmarking](https://github.com/DescartesResearch/TeaStore/wiki/Testing-and-Benchmarking) and resource management - tested with LIMBO and JMeter, and the methodology for these tests is available - [TrainTicket](https://github.com/FudanSELab/train-ticket) - Docker and Kubernetes - Can be used in conjunction with, for example, [[Jaeger]] for tracing. ### [[Kubedim]] - self-adaptive [[Reverse Proxy]] that acts like a traffic controller for optional components while the application is under load - integrates easily with [[Kubernetes]] - previous implementation was as a load balancer ([[Ingress Controller]]), but this is a reverse proxy because production environments typically already use ingress controllers, and ingress controllers don't allow as much in the way of scripting - non-uniform reduction of availability for optional components - model-based approach that takes into account feature interactions, bottleneck transfers, and priority of users (only low priority users see a reduction in availability) - declarative configuration ([[YAML]]) - employs [[Dimming]] or brownout (according to Kelvin, they are interchangeable) ![[Kubedim architecture.png]] [^zhang] #### Testing setup ##### User flows - buying - high priority (likely to check out) - browsing - low priority - news - lowest priority (blog and delivery updates) ##### Scheduler Kelvin implemented a custom-built scheduler to do the following: - add a think time from a uniform distribution (from 2 to 7 seconds) - send the next request after only the mandatory resources on the page have been loaded (to simulate end-user impatience) - execute requests using this diagram: ![[zhang-behavioral_diagram.png]] [^zhang] Tool used: [[k6 (tool)|k6]] v0.30.0 #### Experiments ##### 0: No dimming In this control experiment, no dimming was applied. ##### 1: Uniform dimming Brownout starts when 95th percentile response time > 3s, and then the dimming increases the higher the response time is. ##### 2: Non-uniform dimming: component weightings In this experiment, components were given weightings that represented their share of the overall load (higher contribution = higher weighting). Then, requests were prioritized according to the weight, with heavier dimming on higher-weighted components. Weightings were chosen a few ways: - Developers would choose an initial weighting based on their knowledge of the user flow and business. - An offline training tool analyzed historical traffic and guided developers on the right weightings to use. - An online training tool ran A/B tests in production to improve the accuracy of weightings. ##### 3: Non-uniform dimming: user profiling In this experiment, users were profiled by leaving cookies that tracked their behavior, and then assigned profiles according to how likely they were to purchase items. ##### 4: Non-uniform dimming: combined component weightings and user profiling This experiment combined component weights and user profiling. #### Results On the constant load profile: (280 users for 30 minutes, 10-s ramp up) | Scenario | Response time (mean) | Items checked out | Recommendations checked out | | ---------------------- | -------------------- | ----------------- | --------------------------- | | 0: No dimming | 5.25s | 39.4 | 23.2 | | 1: Uniform dimming | 2.97s | 2313.4 | 887.4 | | 2: Component weighting | 2.99s | 2698.4 | 1439.0 | | 3: User profiling | 2.97s | 2089.4 | 1056.0 | | 4: Combined | N/A | 2144.4 | 1146.4 | - Uniform dimming yielded significantly better availability than no dimming - The best strategy involved only component weighting (Experiment 2) - User profiling alone and combined with component weighting were still significantly better than no dimming, but worse than both uniform dimming and component weighting alone ![[kubedim-user profiling results.png]] [^zhang] ![[kubedim-brownout-strategies.png]] [^zhang] ### Limitations - Kubedim was only tested on one application, SockShop, which did not receive "real" load as it was designed for testing/demo purposes. - Component weightings rely on developers initially setting them, then the training tools would improve them. To some degree, then, the accuracy of the weighting could be completely arbitrary. - While a [[Flash crowds]] load profile was mentioned, the comparisons were taken during a constant load profile only. - The experiments were judged on three criteria only: response time, number of items checked out, and number of recommended items checked out. - Resource utilization (CPU, memory) on both the side of the load generator and the microservices were not mentioned. These are criteria that could significantly impact availability and the accuracy of the test results. - Load tests executed were only 30 minutes and 10 seconds in duration. Load tests in business environments are often longer, especially for e-commerce sites that may have high load that lasts for one hour or two (especially during a lunch break). - User profiling, as implemented, is ethically undesirable, which Kelvin tackles in the paper. ### Ideas for future research - Use multiple live microservices applications in production. - The online and offline training tools could be useful for workload modelling for [[Load Testing]]. - Non-uniform dimming based on commercial value (higher-usage customers are prioritized) could prove more ultimately impactful for an organization. - Do studies to see how close developers get to assigning correct weightings. - Institute a component weighting algorithm based on the training tools. - Re-run the experiment with different load profiles (spike, stress). - Add different types of chaos experiments to see how Kubedim responds. - Explore how Kubedim could work with auto-scaling. At which point should dimming be implemented? - Kubedim component weighting based on SLAs (as suggested by Daniel). Dim based on which services are approaching SLA breaches. %% ## My talking points with Kelvin [k6 Office Hours 24 with Kelvin Zhang](obsidian://open?vault=internal-docs&file=company%2Fdevrel%2FProjects%2FOffice%20Hours%2FkOH%2024%20-%20Kelvin%20Zhang) <iframe width="560" height="315" src="https://www.youtube.com/embed/O1Evg9_EEmU" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe> - Who are you? What did you study, where did you study? - What was your thesis about, and how did you choose the topic? - The incumbent way: monoliths - The modern way: Microservices-based architecture - Advantage against monoliths: scalability, maintenance - Concerns regarding microservices: complexity of interactions - Motivation for thesis: scalability is a concern for microservices; retrying requests can lead to further degradation (retry storms) - Kubedim in a nutshell - Reverse proxy that acts as a traffic controller for microservices and intelligently scales back availability of some services depending on demand - improved circuit breaker with intelligent brownout - > a self-adaptive, brownout-enabled reverse proxy which applies brownout strategies on optional components of a cloud application in order to meet both system stability and business objectives - p.23 of thesis - Goals for the project - Improve the use of brownout in existing control theory approaches and make it easier to make good brownout decisions (taking into account the mix of requests) - Create a workflow that DevOps can use for businesses - Dimming: what is it? - uniform dimming as a baseline - brownout starts when 95th percentile response time > 3s - The higher the response time, the higher the level of dimming - non-uniform dimming based on: - component weightings - Components weighted depending on their contribution to the overall load. Higher contribution = higher weighting - user profiling - Users are profiled based on their session request history, and then low priority users are dimmed while high priority users are not. - Relies on developer estimate on which user flows they think would be important - Offline (non-prod) training to guide engineers on the component weightings to use - Component weightings tool is also useful for writing a load test (workload modelling) - Online (production) tool to run A/B tests on weightings and verify them; runs every three minutes - Related concepts - circuit breaker pattern - Maybe show diagrams from [Medium - Nerd for Tech](https://medium.com/nerd-for-tech/design-patterns-for-microservices-circuit-breaker-pattern-ba402a45aac2) - A circuit breaker stops requests after a certain threshold of failures has been reached - Weakness: it doesn't differentiate between critical and optional requests - brownout theory - Separates required and optional, and *then* applies circuit breaker to the optional - Weakness: that changes the load distribution of requests (more critical/complex ones than optional/simpler ones) - Uniform approach? Why is model-based better? - How much of an overhead does Kubedim add? - Results - 22% improvement in availability compared to previous approaches - constant: 10s ramp up to 280 users, steady for 30 minutes. - Baseline dimming disabled: 5.25s - Baseline dimming enabled: 2.97s - Component weighting enabled vs baseline dimming: more users were able to check out (2700 vs 2300), 2.99s - Profiling enabled vs baseline dimming: 2.97s, fewer check outs (2089 vs 2313) but more recommendations. - Component weighting + profiling combined: Not worth it; increase in check-outs but also in optional components dimmed - flash crowd shape: oscillating from 100 to 280 users in 3-minute intervals - Why did you choose to use k6? - > We choose k6 as our load testing tool for sending load to our reference application. Un- like prior work like which use Apache JMeter [6] or Locust [35], we use k6 due to its configurability and ability to orchestrate load through its JavaScript and Go APIs. - p. 40 - Experiences with Locust?%% ## Citation ``` [^zhang]: Zhang, K. (2021). _Kubedim - self adaptive service degradation of microservices-based_systems_. Imperial College London. Retrieved from https://iamkelv.in/assets/thesis.pdf . [[Kubedim_-_self_adaptive_service_degradation_of_microservices-based_systems.pdf|Source note]] and [[Kubedim - Self-adaptive service degradation of microservices-based systems (lit)|literature note]]. ``` [^zhang]: Zhang, K. (2021). _Kubedim - self adaptive service degradation of microservices-based_systems_. Imperial College London. Retrieved from https://iamkelv.in/assets/thesis.pdf . [[Kubedim_-_self_adaptive_service_degradation_of_microservices-based_systems.pdf|Source note]] and [[Kubedim - Self-adaptive service degradation of microservices-based systems (lit)|literature note]]. [^wikipedia]: _Control theory_. Wikipedia. Accessed in August 2021 from https://en.wikipedia.org/wiki/Control_theory [^nerdfortech]: Pubudu, N. (2021). *Design patterns for microservices - Circuit breaker pattern*. Retrieved from https://medium.com/nerd-for-tech/design-patterns-for-microservices-circuit-breaker-pattern-ba402a45aac2 [^richardson]: Richardson, C. Microservice Architecture pattern. (2017). Accessed August 2021 from: https://microservices.io/patterns/microservices.html. [^nginx]: Rawdat A. Announcing NGINX Ingress Controller for Kubernetes Release 1.8.0 - NGINX. (2020). Accessed 10th August 2021]. Available from: https://www.nginx.com/blog/announcing-nginx-ingress-controller-forkubernetes-release-1-8-0/.