Schrödinger's Pokémon - TestCon Europe

Conference: [[TestCon Europe 2021]] [speaker page](https://testcon.lt/Nicole-van-der-Hoeven/) September 9th, 2021 10:55-11:40 (UTC+2) - 45 minutes ## Structure - Introduction (2 min) - Who I am (briefly) - I love Pokémon - What is chaos engineering? - Four steps in unleashing chaos: - I. Define and prepare steady state - II. Build a hypothesis - III. Execute chaos experiments to disprove the hypothesis - IV. Analyze results - Chaos engineering IS a type of testing - Shouldn't we be coding for these in our tests? - Observability is crucial: Schrödinger's cat. - We need visibility into the box - I. Defining and preparing the steady state/baseline - The application: Poké API on a managed k8s cluster on DigitalOcean - Setting up Grafana - k6 integration. - Grafana plugin for Kubernetes - Running a baseline load test - II. Formulating hypotheses - If we terminate one **APP** pod, the error rate and response time should still be less than or equal to 5%. - If we terminate one **WEB** pod, the error rate and response time should still be less than or equal to 5%. - If we starve the CPU to 90%, the error rate and response time should still be less than or equal to 5%. - III. Running chaos experiments with k6 and xk6-chaos - Using the same script for a load test as for a chaos experiment - What the script looks like - Extensibility of k6 - IV. Seeing the results on Grafana - Conclusion - Think about how you can incorporate chaos testing principles and practices into your testing. - Without observability, how will you know whether the Pokémon is alive or dead? ## Slides [[sources/Presentation/Mine/slides/content/2021-schrodingers-pokemon/_index]] ## To do - [ ] Create a new binary of k6 with the later version of xk6-chaos - [ ] Get current script running with the new binary - [ ] Add CPU starvation - talk to Daniel about this - [ ] Add memory starvation? ## Tests Response time is in 95th percentile and is specifically for `01_GetPokemon` that returns a 200 Error rate is the average over the entire test. CPU and memory are for the application servers, not the load generator server. RPS is peak RPS Memory and CPU are max total/single utilization. | Run | VUs | Description | Response Time | Error rate | Requests | CPU % | Memory % | Date and Time | URLs | | --- | --- | ------------------------------ | ------------- | ---------- | -------- | ---------- | -------- | ------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | 00 | 10 | Baseline | 4.223s | 1.15% | 13,794 | | | 19/08 23:20 | [k6](https://app.k6.io/runs/1079485); Grafana: | | A1 | 40 | 40 user load | ❌17.663s | ✅1.22% | 20,492 | 74.3/80.4 | 58.1 | 23/08 13:57 | [k6](https://app.k6.io/runs/1081789); [Grafana](http://localhost:3000/d/UCJUx0M7k/kubernetes-cluster-monitoring-via-prometheus?orgId=1&from=1629719400000&to=1629724200000&var-interval=%24__auto_interval_interval&var-datasource=default&var-Node=All) | | B1 | 10 | App pod termination | ❌5.311s | ✅1.14% | 12,242 | 48/67.0 | 62.3 | 23/08 18:58 | [k6](https://app.k6.io/runs/1082038); [Grafana](http://localhost:3000/d/UCJUx0M7k/kubernetes-cluster-monitoring-via-prometheus?orgId=1&from=1629737400000&to=1629742200000&var-interval=%24__auto_interval_interval&var-datasource=default&var-Node=All) | | C1 | 10 | Web pod termination | ❌5.087s | ✅1.09% | 12,412 | 52.267.8 | 60.4 | 23/08 17:37 | [k6](https://app.k6.io/runs/1081969); [Grafana](http://localhost:3000/d/UCJUx0M7k/kubernetes-cluster-monitoring-via-prometheus?orgId=1&from=1629732600000&to=1629737400000&var-interval=%24__auto_interval_interval&var-datasource=default&var-Node=All) | | A2 | 40 | 40 user load, no limits | ❌13.311s | ✅1.15% | 26,582 | 96.4/100.0 | 60.7 | 06/09 11:57 | [k6](https://app.k6.io/runs/1094880); [Grafana](http://localhost:3000/d/UCJUx0M7k/kubernetes-cluster-monitoring-via-prometheus?orgId=1&from=1630922303814&to=1630926030714&var-interval=$__auto_interval_interval&var-datasource=default&var-Node=All) | | B3 | 10 | App pod termination, no limits | ✅4.767s | ✅1.11% | 14,001 | 58.3/97.2 | 63.7 | 06/09 13:08 | [k6](https://app.k6.io/runs/1094948); [Grafana](http://localhost:3000/d/UCJUx0M7k/kubernetes-cluster-monitoring-via-prometheus?orgId=1&from=1630926578458&to=1630930853254&var-interval=$__auto_interval_interval&var-datasource=default&var-Node=pokeapi-8m9dn) | | C3 | 10 | Web pod termination, no limits | ❌6.37s | ✅1.14% | 12,416 | 51.2/96 | 62.0 | 06/09 14:19 | [k6](https://app.k6.io/runs/1095021); [Grafana] | | C4 | 10 | Web pod termination, no limits | ✅4.63s | ✅1.02% | 14,648 | 55.8/89.6 | 58.5 | 06/09 17:09 | [k6](https://app.k6.io/runs/1095214); [Grafana](http://localhost:3000/d/UCJUx0M7k/kubernetes-cluster-monitoring-via-prometheus?orgId=1&from=1630941013932&to=1630944882936&var-interval=%24__auto_interval_interval&var-datasource=default&var-Node=pokeapi-8m9v9) | ## Figuring stuff out ```` helm repo add prometheus-community https://prometheus-community.github.io/helm-charts helm repo update helm install [RELEASE_NAME] prometheus-community/prometheus-postgres-exporter ```` https://github.com/prometheus-community/helm-charts/ https://github.com/prometheus-community/helm-charts/tree/main/charts/prometheus-postgres-exporter Installing Grafana plugin - Create a values.yaml using [this](https://raw.githubusercontent.com/grafana/helm-charts/main/charts/grafana/values.yaml) as a template. ```yaml ## Pass the plugins you want installed as a list. ## plugins: # - digrich-bubblechart-panel # - grafana-clock-panel - grafana-k6cloud-datasource ``` - Trying to figure out how to get Prometheus Postgres exporter working - `helm install postgres-exp prometheus-community/prometheus-postgres-exporter` - `helm uninstall postgres-exp` - `helm install postgres-exp prometheus-community/prometheus-postgres-exporter -f pg-values.yaml` - To upgrade an already installed release without re-installing it: `helm upgrade prometheus prometheus-community/prometheus -f prometheus-values.yaml` - First, I needed to install the Prometheus Postgres exporter - Then, I had to configure the exporter to get access to my postgres DB: (`pg-values.yaml`) ```yaml config: datasource: # Specify one of both datasource or datasourceSecret host: db user: ash # Only one of password and passwordSecret can be specified password: pokemon # Specify passwordSecret if DB password is stored in secret. passwordSecret: {} # Secret name # name: # Password key inside secret # key: port: 5432 database: pokeapi sslmode: disable ``` - Then, I had to re-install/upgrade Prometheus to scrape the exporter: (`prometheus-values.yaml`) ```yaml extraScrapeConfigs: | - job_name: postgres-extractor scrape_interval: 5m static_configs: - targets: ['postgres-exp-prometheus-postgres-exporter:80'] ``` And `helm upgrade prometheus prometheus-community/prometheus -f prometheus-values.yaml` to upgrade Prometheus to use that new scraping config. ### 12:27 There is a Grafana PostgreSQL data source plugin, and these settings work: ![[grafana-postgres-datasource.png]] `db` is the name of the pod. However, Daniel said that the data source plugin is only for running queries on the DB. To expose the metrics, I still need to use the exporter + scraper. ### 12:47 So, now I'm getting SOME metrics going from my postgres DB > Prometheus > Grafana, but only some of them. Where are the others? ### 15:30 ✅ ❌