%%
Last Updated: [[2021-02-14]]
%%
# PRINCIPLES OF CHAOS ENGINEERING - Principles of Chaos Engineering
[Readwise URL](https://readwise.io/bookreview/7721656) | [Source URL](https://principlesofchaos.org/)
---

---
How much confidence we can have in the complex systems that we put into production? ^145418655
**References:** https://instapaper.com/read/1386512250/15501443
---
An empirical, systems-based approach addresses the chaos in distributed systems at scale and builds confidence in the ability of those systems to withstand realistic conditions. We learn about the behavior of a distributed system by observing it during a controlled experiment. We call this Chaos Engineering. ^145418656
**References:** https://instapaper.com/read/1386512250/15501447
---
> These experiments follow four steps:
> \
> Start by defining 'steady state' as some measurable output of a system that indicates normal behavior.
> Hypothesize that this steady state will continue in both the control group and the experimental group.
> Introduce variables that reflect real world events like servers that crash, hard drives that malfunction, network connections that are severed, etc.
> Try to disprove the hypothesis by looking for a difference in steady state between the control group and the experimental group.
> The harder it is to disrupt the steady state, the more confidence we have in the behavior of the system. If a weakness is uncovered, we now have a target for improvement before that behavior manifests in the system at large.
^145418657
**References:** https://instapaper.com/read/1386512250/15501450
---
> Build a Hypothesis around Steady State Behavior
> Focus on the measurable output of a system, rather than internal attributes of the system.
^145418658
**References:** https://instapaper.com/read/1386512250/15501454
---
> Vary Real-world Events
> Chaos variables reflect real-world events. Prioritize events either by potential impact or estimated frequency.
^145418659
**References:** https://instapaper.com/read/1386512250/15501457
---
Run Experiments in Production ^145418660
**References:** https://instapaper.com/read/1386512250/15501460
---
To guarantee both authenticity of the way in which the system is exercised and relevance to the current deployed system, Chaos strongly prefers to experiment directly on production traffic. ^145418661
**References:** https://instapaper.com/read/1386512250/15501461
---
> Automate Experiments to Run Continuously
> Running experiments manually is labor-intensive and ultimately unsustainable.
^145418662
**References:** https://instapaper.com/read/1386512250/15501462
---
Minimize Blast Radius ^145418663
**References:** https://instapaper.com/read/1386512250/15501464
---
While there must be an allowance for some short-term negative impact, it is the responsibility and obligation of the Chaos Engineer to ensure the fallout from experiments are minimized and contained. ^145418664
**References:** https://instapaper.com/read/1386512250/15501466
---