# [[How to install Kepler]]
![[How to install Kepler.svg]]
Kepler (Kubernetes Efficient Power Level Exporter) tracks energy consumption in Kubernetes clusters. This guide covers installation on cloud environments (GKE, EKS, AKS) without hardware power sensors, using model-based estimation.
## Prerequisites
- Kubernetes 1.20+
- kubectl with cluster-admin access
- Linux kernel 4.18+ with eBPF support
- Prometheus (for metrics collection)
- Grafana (optional, for visualization)
## Installation Steps
### 1. Create Kepler Deployment
Create a file named `kepler.yaml` with the following content:
---
# Kepler Namespace
apiVersion: v1
kind: Namespace
metadata:
name: kepler
---
# Kepler ServiceAccount
apiVersion: v1
kind: ServiceAccount
metadata:
name: kepler
namespace: kepler
---
# Kepler ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: kepler
rules:
- apiGroups: [""]
resources:
- nodes
- nodes/metrics
- nodes/stats
- nodes/proxy
- pods
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources:
- nodes/stats
verbs: ["get"]
- nonResourceURLs:
- /metrics
verbs: ["get"]
---
# Kepler ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: kepler
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: kepler
subjects:
- kind: ServiceAccount
name: kepler
namespace: kepler
---
# Kepler ConfigMap
apiVersion: v1
kind: ConfigMap
metadata:
name: kepler-config
namespace: kepler
data:
KEPLER_NAMESPACE: "kepler"
ENABLE_GPU: "false"
METRICS_PORT: "9102"
BIND_ADDRESS: "0.0.0.0:9102"
# CRITICAL: Disable RAPL for cloud environments
ENABLE_RAPL: "false"
ENABLE_PLATFORM_RAPL: "false"
ENABLE_EBPF_CGROUPID: "true"
# Enable model-based estimation
ESTIMATOR: "true"
MODEL_SERVER_ENABLE: "true"
MODEL_SERVER_URL: "http://kepler-model-server.kepler.svc.cluster.local:8100"
MODEL_CONFIG: "CONTAINER_COMPONENTS_ESTIMATOR=true,CONTAINER_COMPONENTS_INIT_URL=https://raw.githubusercontent.com/sustainable-computing-io/kepler-model-server/main/tests/test_models/DynComponentModelWeight/CgroupOnly/ScikitMixed/ScikitMixed.json"
---
# Kepler DaemonSet
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: kepler
namespace: kepler
labels:
app.kubernetes.io/name: kepler
app.kubernetes.io/component: exporter
spec:
selector:
matchLabels:
app.kubernetes.io/name: kepler
app.kubernetes.io/component: exporter
template:
metadata:
labels:
app.kubernetes.io/name: kepler
app.kubernetes.io/component: exporter
spec:
serviceAccountName: kepler
hostNetwork: true
hostPID: true
dnsPolicy: ClusterFirstWithHostNet
containers:
- name: kepler
image: quay.io/sustainable_computing_io/kepler:release-0.7.11
imagePullPolicy: IfNotPresent
securityContext:
privileged: true
ports:
- name: http-metrics
containerPort: 9102
hostPort: 9102
livenessProbe:
httpGet:
path: /healthz
port: 9102
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 60
timeoutSeconds: 10
failureThreshold: 5
envFrom:
- configMapRef:
name: kepler-config
env:
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: NODE_IP
valueFrom:
fieldRef:
fieldPath: status.hostIP
volumeMounts:
- name: lib-modules
mountPath: /lib/modules
readOnly: true
- name: tracing
mountPath: /sys
readOnly: true
- name: proc
mountPath: /proc
readOnly: true
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 512Mi
volumes:
- name: lib-modules
hostPath:
path: /lib/modules
type: Directory
- name: tracing
hostPath:
path: /sys
type: Directory
- name: proc
hostPath:
path: /proc
type: Directory
tolerations:
- effect: NoSchedule
key: node-role.kubernetes.io/master
- effect: NoSchedule
key: node-role.kubernetes.io/control-plane
---
# Kepler Service
apiVersion: v1
kind: Service
metadata:
name: kepler
namespace: kepler
labels:
app.kubernetes.io/name: kepler
app.kubernetes.io/component: exporter
spec:
type: ClusterIP
clusterIP: None
ports:
- name: http-metrics
port: 9102
targetPort: http-metrics
protocol: TCP
selector:
app.kubernetes.io/name: kepler
app.kubernetes.io/component: exporter
---
# Kepler Model Server Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: kepler-model-server
namespace: kepler
labels:
app.kubernetes.io/name: kepler-model-server
spec:
replicas: 1
selector:
matchLabels:
app.kubernetes.io/name: kepler-model-server
template:
metadata:
labels:
app.kubernetes.io/name: kepler-model-server
spec:
containers:
- name: model-server
image: quay.io/sustainable_computing_io/kepler_model_server:v0.7
imagePullPolicy: IfNotPresent
command:
- python3
- -u
- src/server/model_server.py
ports:
- name: http
containerPort: 8100
livenessProbe:
httpGet:
path: /healthz
port: 8100
initialDelaySeconds: 30
periodSeconds: 60
resources:
requests:
cpu: 100m
memory: 256Mi
limits:
cpu: 500m
memory: 1Gi
---
# Kepler Model Server Service
apiVersion: v1
kind: Service
metadata:
name: kepler-model-server
namespace: kepler
labels:
app.kubernetes.io/name: kepler-model-server
spec:
type: ClusterIP
ports:
- name: http
port: 8100
targetPort: http
protocol: TCP
selector:
app.kubernetes.io/name: kepler-model-server
### 2. Deploy Kepler
kubectl apply -f kepler.yaml
### 3. Verify Installation
# Check pods are running
kubectl get pods -n kepler
# Expected output: All pods should show 1/1 Running
# View Kepler logs
kubectl logs -n kepler -l app.kubernetes.io/name=kepler --tail=20
# Test metrics endpoint
kubectl exec -n kepler daemonset/kepler -- curl -s localhost:9102/metrics | grep kepler_container
## Prometheus Configuration
Add this scrape job to your Prometheus configuration:
scrape_configs:
- job_name: 'kepler'
kubernetes_sd_configs:
- role: pod
namespaces:
names:
- kepler
relabel_configs:
- source_labels: [__meta_kubernetes_pod_label_app_kubernetes_io_name]
regex: kepler
action: keep
- source_labels: [__meta_kubernetes_pod_label_app_kubernetes_io_component]
regex: exporter
action: keep
- source_labels: [__meta_kubernetes_pod_ip]
target_label: __address__
replacement: ${1}:9102
- source_labels: [__meta_kubernetes_pod_node_name]
target_label: node
scrape_interval: 30s
Restart Prometheus to apply the configuration:
kubectl rollout restart deployment/prometheus -n <prometheus-namespace>
## Key Metrics
### Power Consumption
# Total cluster power (Watts)
sum(rate(kepler_node_platform_joules_total[1m]) * 1000)
# Power per pod
sum(rate(kepler_container_joules_total[1m]) * 1000) by (pod_name)
# Power per namespace
sum(rate(kepler_container_joules_total[1m]) * 1000) by (container_namespace)
# Power per node
sum(rate(kepler_node_platform_joules_total[1m]) * 1000) by (node)
### Energy and Cost
# Total energy (kWh)
sum(kepler_node_platform_joules_total) / 3600000
# Energy per pod (kWh)
sum(kepler_container_joules_total) by (pod_name) / 3600000
# Estimated CO2 emissions (grams, US grid average 475g/kWh)
(sum(kepler_node_platform_joules_total) / 3600000) * 475
# Estimated cost (USD at $0.12/kWh)
(sum(kepler_node_platform_joules_total) / 3600000) * 0.12
### Efficiency Metrics
# Requests per Watt
sum(rate(http_requests_total[1m])) / sum(rate(kepler_container_joules_total[1m]) * 1000)
# CPU efficiency (cores per Watt)
sum(rate(container_cpu_usage_seconds_total[1m])) / sum(rate(kepler_container_joules_total[1m]) * 1000)
## Configuration Details
### Critical Settings for Cloud Environments
ENABLE_RAPL: "false" # Disable hardware sensor requirement
ENABLE_PLATFORM_RAPL: "false" # No RAPL available on cloud VMs
ESTIMATOR: "true" # Enable model-based estimation
MODEL_SERVER_ENABLE: "true" # Use ML model server for estimates
### How Estimation Works
Without hardware power sensors (RAPL), Kepler estimates power consumption based on:
- CPU utilization patterns
- Memory usage
- Network I/O activity
- Disk operations
- Machine learning models trained on real hardware data
**Accuracy:** Approximately 85-90% for relative comparisons between workloads. Less accurate for absolute power values.
**Best Use Cases:**
- Comparing efficiency of different implementations
- Identifying power-hungry workloads
- Tracking trends over time
- Cost attribution by namespace/team
## Grafana Dashboard
### Quick Import
1. In Grafana, navigate to Dashboards → Import
2. Enter dashboard ID: `18681` (official Kepler dashboard)
3. Select your Prometheus datasource
4. Click Import
### Manual Dashboard Creation
Create panels with these queries:
**Cluster Power (Gauge)**
sum(rate(kepler_node_platform_joules_total[1m]) * 1000)
**Top 10 Power Consumers (Table)**
topk(10, sum(rate(kepler_container_joules_total[5m]) * 1000) by (pod_name, container_namespace))
**Power by Namespace (Pie Chart)**
sum(rate(kepler_container_joules_total[5m]) * 1000) by (container_namespace)
**Power Over Time (Time Series)**
sum(rate(kepler_node_platform_joules_total[1m]) * 1000) by (node)
**Energy Efficiency Score (Stat)**
sum(rate(http_requests_total[1m])) / sum(rate(kepler_container_joules_total[1m]) * 1000)
## Troubleshooting
### Pods Stuck in ContainerCreating
Check for volume mount errors:
kubectl describe pod -n kepler <pod-name>
Common issue on GKE: Read-only filesystem preventing certain mounts. Remove unnecessary volume mounts like kernel-src or kernel-debug.
### Pods Crash with "no RAPL zones found"
Ensure these ConfigMap settings:
ENABLE_RAPL: "false"
ENABLE_PLATFORM_RAPL: "false"
This error occurs when Kepler tries to use hardware sensors that don't exist in cloud environments.
### No Metrics Appearing
Wait 2-3 minutes after deployment, then check:
# Verify metrics are exposed
kubectl port-forward -n kepler svc/kepler 9102:9102
curl localhost:9102/metrics | grep kepler_
# Check Prometheus is scraping
# In Prometheus UI: Status → Targets → Look for "kepler" job
### Model Server Won't Start
Check logs:
kubectl logs -n kepler deployment/kepler-model-server
Common fix: Ensure the command and args are properly set in the deployment spec.
### Metrics Show Zero or NaN
This can happen when:
- Pods just started (wait 2-3 minutes to collect data)
- No workload is running in the cluster
- Model server is still initializing
Check model server readiness:
kubectl get pods -n kepler -l app.kubernetes.io/name=kepler-model-server
## Resource Usage
### Kepler DaemonSet (per node)
- CPU: ~100m (requests), up to 500m (limits)
- Memory: ~128Mi (requests), up to 512Mi (limits)
- Overall overhead: ~2-5% per node
### Model Server
- CPU: ~100m (requests), up to 500m (limits)
- Memory: ~256Mi (requests), up to 1Gi (limits)
## Security Considerations
Kepler requires:
- **Privileged containers** (for eBPF probes)
- **Host network** access
- **Host PID** namespace access
- **Read access** to `/sys`, `/proc`, `/lib/modules`
Review your organization's security policies before deploying Kepler in production.
## Best Practices
1. **Use specific image versions** (not `:latest`) for production deployments
2. **Establish baselines** by measuring idle cluster power before running workloads
3. **Focus on relative comparisons** rather than absolute values when using estimation mode
4. **Set resource limits** to prevent Kepler from consuming excessive resources
5. **Monitor estimation accuracy** by comparing similar workloads
6. **Regular updates** to get improved ML models and bug fixes
7. **Create alerts** for abnormally high power consumption
## Monitoring Kepler Itself
# Check Kepler pod health
kubectl get pods -n kepler -w
# View Kepler exporter logs
kubectl logs -n kepler -l app.kubernetes.io/name=kepler -f
# Check model server logs
kubectl logs -n kepler -l app.kubernetes.io/name=kepler-model-server -f
# Verify metrics collection
kubectl top pods -n kepler
# Test metrics endpoint
kubectl port-forward -n kepler daemonset/kepler 9102:9102
curl http://localhost:9102/metrics | grep kepler_ | wc -l
## Uninstalling Kepler
# Remove all Kepler components
kubectl delete namespace kepler
# Remove cluster-level resources
kubectl delete clusterrole kepler
kubectl delete clusterrolebinding kepler
# Remove Prometheus scrape config (manual step)
# Edit your Prometheus ConfigMap and remove the kepler job
## Upgrading Kepler
# Update the image version in kepler.yaml
# Change: quay.io/sustainable_computing_io/kepler:release-0.7.11
# To: quay.io/sustainable_computing_io/kepler:release-0.7.12 (example)
kubectl apply -f kepler.yaml
# Verify rollout
kubectl rollout status daemonset/kepler -n kepler
kubectl rollout status deployment/kepler-model-server -n kepler
## Hardware-Based Mode (Bare Metal)
If you're running on bare metal with RAPL support, modify the ConfigMap:
ENABLE_RAPL: "true" # Enable hardware sensors
ENABLE_PLATFORM_RAPL: "true" # Use platform RAPL interface
ESTIMATOR: "false" # Disable estimation (use real measurements)
MODEL_SERVER_ENABLE: "false" # Model server not needed
This provides 95%+ accurate hardware-based measurements instead of estimates.
## Additional Resources
- **Official Documentation:** https://sustainable-computing.io/
- **GitHub Repository:** https://github.com/sustainable-computing-io/kepler
- **Model Server:** https://github.com/sustainable-computing-io/kepler-model-server
- **Community:** #kepler channel in Kubernetes Slack
- **Dashboards:** https://grafana.com/grafana/dashboards/?search=kepler
## Use Cases
### 1. Cost Attribution
Track energy costs per team or project:
# Daily cost per namespace (assumes $0.12/kWh)
(sum(increase(kepler_container_joules_total[24h])) by (container_namespace) / 3600000) * 0.12
### 2. Workload Optimization
Compare power efficiency of different implementations:
# Power per request for two services
sum(rate(kepler_container_joules_total{pod_name=~"service-a.*"}[5m]) * 1000) /
sum(rate(http_requests_total{pod=~"service-a.*"}[5m]))
vs
sum(rate(kepler_container_joules_total{pod_name=~"service-b.*"}[5m]) * 1000) /
sum(rate(http_requests_total{pod=~"service-b.*"}[5m]))
### 3. Sustainability Reporting
Generate monthly carbon emission reports:
# Monthly CO2 in kilograms
(sum(increase(kepler_node_platform_joules_total[30d])) / 3600000) * 0.475
### 4. Autoscaling Insights
Correlate power consumption with pod scaling:
# Compare power vs pod count
sum(rate(kepler_container_joules_total{container_namespace="default"}[1m]) * 1000)
and
count(kube_pod_status_phase{namespace="default", phase="Running"})
## Summary
Kepler provides valuable energy visibility for Kubernetes clusters, even without hardware power sensors. While model-based estimates aren't billing-grade accurate, they excel at:
✅ Comparing workload efficiency
✅ Identifying power-hungry applications
✅ Tracking consumption trends
✅ Cost attribution
✅ Sustainability reporting
✅ Optimizing resource allocation
For maximum accuracy, deploy on bare metal with RAPL support. For cloud environments, estimation mode provides sufficient insights for optimization and cost management.
%%
# Excalidraw Data
## Text Elements
## Drawing
```json
{
"type": "excalidraw",
"version": 2,
"source": "https://github.com/zsviczian/obsidian-excalidraw-plugin/releases/tag/2.1.4",
"elements": [
{
"id": "4y8R7iOA",
"type": "text",
"x": 118.49495565891266,
"y": -333.44393157958984,
"width": 3.8599853515625,
"height": 24,
"angle": 0,
"strokeColor": "#1e1e1e",
"backgroundColor": "transparent",
"fillStyle": "solid",
"strokeWidth": 2,
"strokeStyle": "solid",
"roughness": 1,
"opacity": 100,
"groupIds": [],
"frameId": null,
"roundness": null,
"seed": 967149026,
"version": 2,
"versionNonce": 939059582,
"isDeleted": true,
"boundElements": null,
"updated": 1713723615080,
"link": null,
"locked": false,
"text": "",
"rawText": "",
"fontSize": 20,
"fontFamily": 4,
"textAlign": "left",
"verticalAlign": "top",
"containerId": null,
"originalText": "",
"lineHeight": 1.2
}
],
"appState": {
"theme": "dark",
"viewBackgroundColor": "#ffffff",
"currentItemStrokeColor": "#1e1e1e",
"currentItemBackgroundColor": "transparent",
"currentItemFillStyle": "solid",
"currentItemStrokeWidth": 2,
"currentItemStrokeStyle": "solid",
"currentItemRoughness": 1,
"currentItemOpacity": 100,
"currentItemFontFamily": 4,
"currentItemFontSize": 20,
"currentItemTextAlign": "left",
"currentItemStartArrowhead": null,
"currentItemEndArrowhead": "arrow",
"scrollX": 583.2388916015625,
"scrollY": 573.6323852539062,
"zoom": {
"value": 1
},
"currentItemRoundness": "round",
"gridSize": null,
"gridColor": {
"Bold": "#C9C9C9FF",
"Regular": "#EDEDEDFF"
},
"currentStrokeOptions": null,
"previousGridSize": null,
"frameRendering": {
"enabled": true,
"clip": true,
"name": true,
"outline": true
}
},
"files": {}
}
```
%%