%% Last Updated: [[2021-02-14]] %% # PRINCIPLES OF CHAOS ENGINEERING - Principles of Chaos Engineering [Readwise URL](https://readwise.io/bookreview/7721656) | [Source URL](https://principlesofchaos.org/) --- ![](https://readwise-assets.s3.amazonaws.com/static/images/article1.be68295a7e40.png) --- How much confidence we can have in the complex systems that we put into production? ^145418655 **References:** https://instapaper.com/read/1386512250/15501443 --- An empirical, systems-based approach addresses the chaos in distributed systems at scale and builds confidence in the ability of those systems to withstand realistic conditions. We learn about the behavior of a distributed system by observing it during a controlled experiment. We call this Chaos Engineering. ^145418656 **References:** https://instapaper.com/read/1386512250/15501447 --- > These experiments follow four steps: > \ > Start by defining 'steady state' as some measurable output of a system that indicates normal behavior. > Hypothesize that this steady state will continue in both the control group and the experimental group. > Introduce variables that reflect real world events like servers that crash, hard drives that malfunction, network connections that are severed, etc. > Try to disprove the hypothesis by looking for a difference in steady state between the control group and the experimental group. > The harder it is to disrupt the steady state, the more confidence we have in the behavior of the system. If a weakness is uncovered, we now have a target for improvement before that behavior manifests in the system at large. ^145418657 **References:** https://instapaper.com/read/1386512250/15501450 --- > Build a Hypothesis around Steady State Behavior > Focus on the measurable output of a system, rather than internal attributes of the system. ^145418658 **References:** https://instapaper.com/read/1386512250/15501454 --- > Vary Real-world Events > Chaos variables reflect real-world events. Prioritize events either by potential impact or estimated frequency. ^145418659 **References:** https://instapaper.com/read/1386512250/15501457 --- Run Experiments in Production ^145418660 **References:** https://instapaper.com/read/1386512250/15501460 --- To guarantee both authenticity of the way in which the system is exercised and relevance to the current deployed system, Chaos strongly prefers to experiment directly on production traffic. ^145418661 **References:** https://instapaper.com/read/1386512250/15501461 --- > Automate Experiments to Run Continuously > Running experiments manually is labor-intensive and ultimately unsustainable. ^145418662 **References:** https://instapaper.com/read/1386512250/15501462 --- Minimize Blast Radius ^145418663 **References:** https://instapaper.com/read/1386512250/15501464 --- While there must be an allowance for some short-term negative impact, it is the responsibility and obligation of the Chaos Engineer to ensure the fallout from experiments are minimized and contained. ^145418664 **References:** https://instapaper.com/read/1386512250/15501466 ---