# Open Source Solutions for Chaos Engineering in Kubernetes

URL:: https://blog.flant.com/chaos-engineering-in-kubernetes-open-source-tools/
Author:: flant.com
## Highlights
> And we’re getting back to “classic” Chaos Monkey: this tool created by Netflix is still used by this streaming service. Currently, it is integrated with the Spinnaker continuous delivery platform, so it works with any of its supported backends: AWS, Google Compute Engine, Azure, Kubernetes, Cloud Foundry. ([View Highlight](https://read.readwise.io/read/01febxknxjpsts5frx2ar0w7nm))
> 1. kube-monkey
> • GitHub: [https://github.com/asobti/kube-monkey](https://github.com/asobti/kube-monkey)
> • GitHub stars / contributors: ~2000 / 30
> • Written in: Go
> It is one of the oldest chaos tools designed for Kubernetes: the first public commits in its repository were made in December 2016. ([View Highlight](https://read.readwise.io/read/01febxmaycm6wc1edey44n52f2))
> The [ready-made kube-monkey chart](https://github.com/asobti/kube-monkey/blob/master/helm/kubemonkey/README.md) is the easiest way to install and run this utility: ([View Highlight](https://read.readwise.io/read/01febxmszqj0pr09cczy7s629f))
> `--- apiVersion: apps/v1 kind: Deployment metadata: name: nginx namespace: test-monkeys labels: kube-monkey/enabled: enabled kube-monkey/identifier: nginx kube-monkey/kill-mode: random-max-percent kube-monkey/kill-value: "100" kube-monkey/mtbf: "1" spec: selector: matchLabels: app: nginx replicas: 5 template: metadata: labels: app: nginx kube-monkey/enabled: enabled kube-monkey/identifier: nginx spec: containers: - name: nginx image: nginx:1.16 ports: - containerPort: 80 ---` ([View Highlight](https://read.readwise.io/read/01febxp38r9rfqrd1kz6jyce6z))
> 2. chaoskube
> • GitHub: [https://github.com/linki/chaoskube](https://github.com/linki/chaoskube)
> • GitHub stars / contributors: ~1300 / 20+
> • Written in: Go
> This tool also boasts a long history: its first release took place in November 2016. Chaoskube provides a [ready-made chart](https://hub.kubeapps.com/charts/stable/chaoskube) and a comprehensive manual for it. By default, it starts in the dry-run mode, so no resource gets hurt. ([View Highlight](https://read.readwise.io/read/01febxpbjdg31w6c8mt5mjz2ed))
> Chaoskube is a useful, convenient, and easy-to-use tool. However, it can only kill pods (just like kube-monkey). ([View Highlight](https://read.readwise.io/read/01febxpj9wkhcnghk93nchn2h8))
> 3. Chaos Mesh
> • GitHub: [https://github.com/chaos-mesh/chaos-mesh](https://github.com/chaos-mesh/chaos-mesh)
> • GitHub stars / contributors: ~3300 / 80+
> • Written in: Go ([View Highlight](https://read.readwise.io/read/01febxpxn97xwrnpnebdg1vwq1))
> Chaos Mesh consists of two components:
> 1. *Chaos Operator* is its main component, which in turn consists of:
> 1. controller-manager (manages Custom Resources),
> 2. chaos-daemon (a privileged DaemonSet capable of managing network, cgroups, etc.),
> 3. sidecar container that is dynamically inserted into the target pod to interfere with the I/O of the target application.
> 2. *Chaos Dashboard* is a web interface for managing and monitoring the chaos operator. ([View Highlight](https://read.readwise.io/read/01febxq4veg02g19c1daet37aq))
> And here is the list of actions (experiments) available:
> • *pod-kill* — kills the selected pod;
> • *pod-failure* — the pod is unavailable for specified time;
> • *container-kill* — the selected container is killed in the pod;
> • *netem chaos* — network delays, packet repeats;
> • *network-partition* — simulates network partition;
> • *IO chaos* — problems with disks and reading/writing;
> • *time chaos* — injects clock skew into the selected pod;
> • *cpu-burn* — stresses the CPU of the selected pod;
> • *memory-burn* — stresses the memory of the selected pod;
> • *kernel chaos* — the victim will face kernel errors, memory page faults, block I/O problems;
> • *dns chaos* — injects DNS-related errors. ([View Highlight](https://read.readwise.io/read/01febxqv6ssft4n2dv2a7strwr))
> 4. Litmus Chaos
> • GitHub: [https://github.com/litmuschaos/litmus](https://github.com/litmuschaos/litmus)
> • GitHub stars / contributors: ~1600 / 120
> • Written in: TypeScript
> Litmus is another operator to create, manage, and monitor chaos in the Kubernetes cluster. For this, it uses three types of Custom Resources:
> • `ChaosExperiment` defines the experiment itself, required actions, and their schedule;
> • `ChaosEngine` connects an application or Kubernetes node to the specific `ChaosExperiment`;
> • `ChaosResult` stores the results of the experiment. Operator exports it as Prometheus metrics. ([View Highlight](https://read.readwise.io/read/01fehqyacz6mrbjpp59w1gragw))
> 5. Chaos Toolkit
> • GitHub stars / contributors: ~1200 / 10+
> • Written in: Python
> This is a set of Python tools. You can use them to create an Open API for conducting chaos experiments. Chaos Toolkit has a large number of extensions for various providers and environments, including [chaostoolkit-kubernetes](https://github.com/chaostoolkit/chaostoolkit-kubernetes) that we are interested in (however, this project has significantly fewer GitHub stars: ~150).
> You can deploy Chaos Toolkit Operator using Kubernetes manifests, which are to be applied via Kustomize ([https://docs.chaostoolkit.org/deployment/k8s/operator/](https://docs.chaostoolkit.org/deployment/k8s/operator/)): ([View Highlight](https://read.readwise.io/read/01fehqzs5cn51z759g6e64ryvz))
> 6. KubeInvaders (and similar projects)
> • GitHub stars / contributors: ~700 / <10
> • Written in: JavaScript
> We conclude our review of chaos tools for Kubernetes with a very unusual tool — a game! ([View Highlight](https://read.readwise.io/read/01fehr0739b85p278f0pkhkffj))
> Here are some other projects that were not included in the full review for various reasons:
> 1. [PowerfulSeal](https://github.com/powerfulseal/powerfulseal) (~1600 stars on GitHub) is an advanced Python tool for injecting various problems into Kubernetes clusters. It has several operation modes for conducting chaos experiments. You can define YAML policies and conduct experiments automatically, or you can do it manually in “interactive” mode (by breaking everything and observing the consequences), or you can label pods that need to be killed. PowerfulSeal supports a variety of cloud providers (AWS, Azure, GCP, OpenStack) as well as local environments. In addition, it can export metrics to Prometheus and Datadog.
> 2. [Pod-reaper](https://github.com/target/pod-reaper) is a rule-based pod killer that uses the upstream cron library for running experiments. Use this [advanced example](https://github.com/target/pod-reaper/blob/master/examples/complex-deployment.yml) to learn about its main features. Pod-reaper is written in Go. Note its repo wasn’t updated since November 2020.
> 3. [Kube-entropy](https://github.com/alexlokshin/kube-entropy) is an application for testing web services in Kubernetes by monitoring changes in the HTTP status for selected ingresses. It is also written in Go and has not been updated since May 2020.
> 4. [Fabric8 Chaos Monkey](https://fabric8.io/guide/chaosMonkey.html) is an implementation of chaos monkey for the Open Source microservice Fabric8 platform (it is based on Docker, Kubernetes, and Jenkins). You can install it right from the Fabric8’s interface.
> 5. [Kubernetes by Gremlin](https://www.gremlin.com/kubernetes/) is a **non**-Open Source commercial service from renowned experts in the field of chaos engineering. It conducts a comprehensive audit of Kubernetes clusters for reliability and fault-tolerance. Also, there is a free plan available with limited functionality.
> 6. [Mangle by VMware](https://vmware.github.io/mangle/) is another Open Source tool for running chaos experiments against applications and infrastructure components. It can inject faults with a minimal pre-configuration and supports a bunch of various infrastructures (K8s, Docker, vCenter or any Remote Machine with ssh enabled). ([View Highlight](https://read.readwise.io/read/01fehr12trv6zx87mdkc50yw5e))
---
Title: Open Source Solutions for Chaos Engineering in Kubernetes
Author: flant.com
Tags: readwise, articles
date: 2024-01-30
---
# Open Source Solutions for Chaos Engineering in Kubernetes

URL:: https://blog.flant.com/chaos-engineering-in-kubernetes-open-source-tools/
Author:: flant.com
## AI-Generated Summary
This article reviews existing tools for implementing chaos engineering in K8s including kube-monkey, chaoskube, Chaos Mesh, Litmus Chaos, Chaos Toolkit, some games, and even more.
## Highlights
> And we’re getting back to “classic” Chaos Monkey: this tool created by Netflix is still used by this streaming service. Currently, it is integrated with the Spinnaker continuous delivery platform, so it works with any of its supported backends: AWS, Google Compute Engine, Azure, Kubernetes, Cloud Foundry. ([View Highlight](https://read.readwise.io/read/01febxknxjpsts5frx2ar0w7nm))
> 1. kube-monkey
> • GitHub: [https://github.com/asobti/kube-monkey](https://github.com/asobti/kube-monkey)
> • GitHub stars / contributors: ~2000 / 30
> • Written in: Go
> It is one of the oldest chaos tools designed for Kubernetes: the first public commits in its repository were made in December 2016. ([View Highlight](https://read.readwise.io/read/01febxmaycm6wc1edey44n52f2))
> The [ready-made kube-monkey chart](https://github.com/asobti/kube-monkey/blob/master/helm/kubemonkey/README.md) is the easiest way to install and run this utility: ([View Highlight](https://read.readwise.io/read/01febxmszqj0pr09cczy7s629f))
> `--- apiVersion: apps/v1 kind: Deployment metadata: name: nginx namespace: test-monkeys labels: kube-monkey/enabled: enabled kube-monkey/identifier: nginx kube-monkey/kill-mode: random-max-percent kube-monkey/kill-value: "100" kube-monkey/mtbf: "1" spec: selector: matchLabels: app: nginx replicas: 5 template: metadata: labels: app: nginx kube-monkey/enabled: enabled kube-monkey/identifier: nginx spec: containers: - name: nginx image: nginx:1.16 ports: - containerPort: 80 ---` ([View Highlight](https://read.readwise.io/read/01febxp38r9rfqrd1kz6jyce6z))
> 2. chaoskube
> • GitHub: [https://github.com/linki/chaoskube](https://github.com/linki/chaoskube)
> • GitHub stars / contributors: ~1300 / 20+
> • Written in: Go
> This tool also boasts a long history: its first release took place in November 2016. Chaoskube provides a [ready-made chart](https://hub.kubeapps.com/charts/stable/chaoskube) and a comprehensive manual for it. By default, it starts in the dry-run mode, so no resource gets hurt. ([View Highlight](https://read.readwise.io/read/01febxpbjdg31w6c8mt5mjz2ed))
> Chaoskube is a useful, convenient, and easy-to-use tool. However, it can only kill pods (just like kube-monkey). ([View Highlight](https://read.readwise.io/read/01febxpj9wkhcnghk93nchn2h8))
> 3. Chaos Mesh
> • GitHub: [https://github.com/chaos-mesh/chaos-mesh](https://github.com/chaos-mesh/chaos-mesh)
> • GitHub stars / contributors: ~3300 / 80+
> • Written in: Go ([View Highlight](https://read.readwise.io/read/01febxpxn97xwrnpnebdg1vwq1))
> Chaos Mesh consists of two components:
> 1. *Chaos Operator* is its main component, which in turn consists of:
> 1. controller-manager (manages Custom Resources),
> 2. chaos-daemon (a privileged DaemonSet capable of managing network, cgroups, etc.),
> 3. sidecar container that is dynamically inserted into the target pod to interfere with the I/O of the target application.
> 2. *Chaos Dashboard* is a web interface for managing and monitoring the chaos operator. ([View Highlight](https://read.readwise.io/read/01febxq4veg02g19c1daet37aq))
> And here is the list of actions (experiments) available:
> • *pod-kill* — kills the selected pod;
> • *pod-failure* — the pod is unavailable for specified time;
> • *container-kill* — the selected container is killed in the pod;
> • *netem chaos* — network delays, packet repeats;
> • *network-partition* — simulates network partition;
> • *IO chaos* — problems with disks and reading/writing;
> • *time chaos* — injects clock skew into the selected pod;
> • *cpu-burn* — stresses the CPU of the selected pod;
> • *memory-burn* — stresses the memory of the selected pod;
> • *kernel chaos* — the victim will face kernel errors, memory page faults, block I/O problems;
> • *dns chaos* — injects DNS-related errors. ([View Highlight](https://read.readwise.io/read/01febxqv6ssft4n2dv2a7strwr))
> 4. Litmus Chaos
> • GitHub: [https://github.com/litmuschaos/litmus](https://github.com/litmuschaos/litmus)
> • GitHub stars / contributors: ~1600 / 120
> • Written in: TypeScript
> Litmus is another operator to create, manage, and monitor chaos in the Kubernetes cluster. For this, it uses three types of Custom Resources:
> • `ChaosExperiment` defines the experiment itself, required actions, and their schedule;
> • `ChaosEngine` connects an application or Kubernetes node to the specific `ChaosExperiment`;
> • `ChaosResult` stores the results of the experiment. Operator exports it as Prometheus metrics. ([View Highlight](https://read.readwise.io/read/01fehqyacz6mrbjpp59w1gragw))
> 5. Chaos Toolkit
> • GitHub stars / contributors: ~1200 / 10+
> • Written in: Python
> This is a set of Python tools. You can use them to create an Open API for conducting chaos experiments. Chaos Toolkit has a large number of extensions for various providers and environments, including [chaostoolkit-kubernetes](https://github.com/chaostoolkit/chaostoolkit-kubernetes) that we are interested in (however, this project has significantly fewer GitHub stars: ~150).
> You can deploy Chaos Toolkit Operator using Kubernetes manifests, which are to be applied via Kustomize ([https://docs.chaostoolkit.org/deployment/k8s/operator/](https://docs.chaostoolkit.org/deployment/k8s/operator/)): ([View Highlight](https://read.readwise.io/read/01fehqzs5cn51z759g6e64ryvz))
> 6. KubeInvaders (and similar projects)
> • GitHub stars / contributors: ~700 / <10
> • Written in: JavaScript
> We conclude our review of chaos tools for Kubernetes with a very unusual tool — a game! ([View Highlight](https://read.readwise.io/read/01fehr0739b85p278f0pkhkffj))
> Here are some other projects that were not included in the full review for various reasons:
> 1. [PowerfulSeal](https://github.com/powerfulseal/powerfulseal) (~1600 stars on GitHub) is an advanced Python tool for injecting various problems into Kubernetes clusters. It has several operation modes for conducting chaos experiments. You can define YAML policies and conduct experiments automatically, or you can do it manually in “interactive” mode (by breaking everything and observing the consequences), or you can label pods that need to be killed. PowerfulSeal supports a variety of cloud providers (AWS, Azure, GCP, OpenStack) as well as local environments. In addition, it can export metrics to Prometheus and Datadog.
> 2. [Pod-reaper](https://github.com/target/pod-reaper) is a rule-based pod killer that uses the upstream cron library for running experiments. Use this [advanced example](https://github.com/target/pod-reaper/blob/master/examples/complex-deployment.yml) to learn about its main features. Pod-reaper is written in Go. Note its repo wasn’t updated since November 2020.
> 3. [Kube-entropy](https://github.com/alexlokshin/kube-entropy) is an application for testing web services in Kubernetes by monitoring changes in the HTTP status for selected ingresses. It is also written in Go and has not been updated since May 2020.
> 4. [Fabric8 Chaos Monkey](https://fabric8.io/guide/chaosMonkey.html) is an implementation of chaos monkey for the Open Source microservice Fabric8 platform (it is based on Docker, Kubernetes, and Jenkins). You can install it right from the Fabric8’s interface.
> 5. [Kubernetes by Gremlin](https://www.gremlin.com/kubernetes/) is a **non**-Open Source commercial service from renowned experts in the field of chaos engineering. It conducts a comprehensive audit of Kubernetes clusters for reliability and fault-tolerance. Also, there is a free plan available with limited functionality.
> 6. [Mangle by VMware](https://vmware.github.io/mangle/) is another Open Source tool for running chaos experiments against applications and infrastructure components. It can inject faults with a minimal pre-configuration and supports a bunch of various infrastructures (K8s, Docker, vCenter or any Remote Machine with ssh enabled). ([View Highlight](https://read.readwise.io/read/01fehr12trv6zx87mdkc50yw5e))