PRINCIPLES OF CHAOS ENGINEERING - Principles of Chaos Engineering

# PRINCIPLES OF CHAOS ENGINEERING - Principles of Chaos Engineering ![rw-book-cover](https://readwise-assets.s3.amazonaws.com/static/images/article1.be68295a7e40.png) URL:: https://principlesofchaos.org/ Author:: principlesofchaos.org ## Highlights > How much confidence we can have in the complex systems that we put into production? ([View Highlight](https://instapaper.com/read/1386512250/15501443)) > An empirical, systems-based approach addresses the chaos in distributed systems at scale and builds confidence in the ability of those systems to withstand realistic conditions. We learn about the behavior of a distributed system by observing it during a controlled experiment. We call this Chaos Engineering. ([View Highlight](https://instapaper.com/read/1386512250/15501447)) > These experiments follow four steps: > Start by defining ‘steady state’ as some measurable output of a system that indicates normal behavior. > Hypothesize that this steady state will continue in both the control group and the experimental group. > Introduce variables that reflect real world events like servers that crash, hard drives that malfunction, network connections that are severed, etc. > Try to disprove the hypothesis by looking for a difference in steady state between the control group and the experimental group. > The harder it is to disrupt the steady state, the more confidence we have in the behavior of the system. If a weakness is uncovered, we now have a target for improvement before that behavior manifests in the system at large. ([View Highlight](https://instapaper.com/read/1386512250/15501450)) > Build a Hypothesis around Steady State Behavior > Focus on the measurable output of a system, rather than internal attributes of the system. ([View Highlight](https://instapaper.com/read/1386512250/15501454)) > Vary Real-world Events > Chaos variables reflect real-world events. Prioritize events either by potential impact or estimated frequency. ([View Highlight](https://instapaper.com/read/1386512250/15501457)) > Run Experiments in Production ([View Highlight](https://instapaper.com/read/1386512250/15501460)) > To guarantee both authenticity of the way in which the system is exercised and relevance to the current deployed system, Chaos strongly prefers to experiment directly on production traffic. ([View Highlight](https://instapaper.com/read/1386512250/15501461)) > Automate Experiments to Run Continuously > Running experiments manually is labor-intensive and ultimately unsustainable. ([View Highlight](https://instapaper.com/read/1386512250/15501462)) > Minimize Blast Radius ([View Highlight](https://instapaper.com/read/1386512250/15501464)) > While there must be an allowance for some short-term negative impact, it is the responsibility and obligation of the Chaos Engineer to ensure the fallout from experiments are minimized and contained. ([View Highlight](https://instapaper.com/read/1386512250/15501466)) --- Title: PRINCIPLES OF CHAOS ENGINEERING - Principles of Chaos Engineering Author: principlesofchaos.org Tags: readwise, articles date: 2024-01-30 --- # PRINCIPLES OF CHAOS ENGINEERING - Principles of Chaos Engineering ![rw-book-cover](https://readwise-assets.s3.amazonaws.com/static/images/article1.be68295a7e40.png) URL:: https://principlesofchaos.org/ Author:: principlesofchaos.org ## AI-Generated Summary None ## Highlights > How much confidence we can have in the complex systems that we put into production? ([View Highlight](https://instapaper.com/read/1386512250/15501443)) > An empirical, systems-based approach addresses the chaos in distributed systems at scale and builds confidence in the ability of those systems to withstand realistic conditions. We learn about the behavior of a distributed system by observing it during a controlled experiment. We call this Chaos Engineering. ([View Highlight](https://instapaper.com/read/1386512250/15501447)) > These experiments follow four steps: > Start by defining ‘steady state’ as some measurable output of a system that indicates normal behavior. > Hypothesize that this steady state will continue in both the control group and the experimental group. > Introduce variables that reflect real world events like servers that crash, hard drives that malfunction, network connections that are severed, etc. > Try to disprove the hypothesis by looking for a difference in steady state between the control group and the experimental group. > The harder it is to disrupt the steady state, the more confidence we have in the behavior of the system. If a weakness is uncovered, we now have a target for improvement before that behavior manifests in the system at large. ([View Highlight](https://instapaper.com/read/1386512250/15501450)) > Build a Hypothesis around Steady State Behavior > Focus on the measurable output of a system, rather than internal attributes of the system. ([View Highlight](https://instapaper.com/read/1386512250/15501454)) > Vary Real-world Events > Chaos variables reflect real-world events. Prioritize events either by potential impact or estimated frequency. ([View Highlight](https://instapaper.com/read/1386512250/15501457)) > Run Experiments in Production ([View Highlight](https://instapaper.com/read/1386512250/15501460)) > To guarantee both authenticity of the way in which the system is exercised and relevance to the current deployed system, Chaos strongly prefers to experiment directly on production traffic. ([View Highlight](https://instapaper.com/read/1386512250/15501461)) > Automate Experiments to Run Continuously > Running experiments manually is labor-intensive and ultimately unsustainable. ([View Highlight](https://instapaper.com/read/1386512250/15501462)) > Minimize Blast Radius ([View Highlight](https://instapaper.com/read/1386512250/15501464)) > While there must be an allowance for some short-term negative impact, it is the responsibility and obligation of the Chaos Engineer to ensure the fallout from experiments are minimized and contained. ([View Highlight](https://instapaper.com/read/1386512250/15501466))