# Kubernetes Best Practices: Resource Requests and Limits ![rw-book-cover](https://storage.googleapis.com/gweb-cloudblog-publish/images/Google_Blog_Containers_08_Toz0BRc.max-2200x2200.jpg) URL:: https://cloud.google.com/blog/products/containers-kubernetes/kubernetes-best-practices-resource-requests-and-limits Author:: Google Cloud Blog ## Highlights > Requests and limits are the mechanisms Kubernetes uses to control resources such as CPU and memory. Requests are what the container is guaranteed to get. If a container requests a resource, Kubernetes will only schedule it on a node that can give it that resource. Limits, on the other hand, make sure a container never goes above a certain value. The container is only allowed to go up to the limit, and then it is restricted. ([View Highlight](https://read.readwise.io/read/01fex9y8hxfdrts77wzpzn45ee)) > It is important to remember that the limit can never be lower than the request. If you try this, Kubernetes will throw an error and won’t let you run the container. ([View Highlight](https://read.readwise.io/read/01fex9yme6ypxr9ex7xy8n3wxx)) > CPU resources are defined in millicores. If your container needs two full cores to run, you would put the value “2000m”. If your container only needs ¼ of a core, you would put a value of “250m”. ([View Highlight](https://read.readwise.io/read/01fexa199syxdm5aa2xp9ep6gh)) > One thing to keep in mind about CPU requests is that if you put in a value larger than the core count of your biggest node, your pod will never be scheduled. Let’s say you have a pod that needs four cores, but your Kubernetes cluster is comprised of dual core VMs—your pod will never be scheduled! ([View Highlight](https://read.readwise.io/read/01fexa1exmmb2jb9sbdaxg3eb3)) > Unless your app is specifically designed to take advantage of multiple cores (scientific computing and some databases come to mind), it is usually a best practice to keep the CPU request at ‘1’ or below, and run more replicas to scale it out. This gives the system more flexibility and reliability. ([View Highlight](https://read.readwise.io/read/01fexa1hptwwa38sb2dt430kgb)) > It’s when it comes to CPU limits that things get interesting. CPU is considered a “compressible” resource. If your app starts hitting your CPU limits, Kubernetes starts throttling your container. This means the CPU will be artificially restricted, giving your app potentially worse performance! However, it won’t be terminated or evicted. You can use a [liveness health check](https://cloud.google.com/blog/products/containers-kubernetes/kubernetes-best-practices-setting-up-health-checks-with-readiness-and-liveness-probes) to make sure performance has not been impacted. ([View Highlight](https://read.readwise.io/read/01fexa1rv8c0r59k9gmdrg7ndv)) > Memory resources are defined in bytes. Normally, you give a [mebibyte](https://en.wikipedia.org/wiki/Byte#Multiple-byte_units) value for memory (this is basically the same thing as a megabyte), but you can give anything from [bytes to petabytes](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#meaning-of-memory). ([View Highlight](https://read.readwise.io/read/01fexa20p7jya146vy1fn8thp5)) > Just like CPU, if you put in a memory request that is larger than the amount of memory on your nodes, the pod will never be scheduled. ([View Highlight](https://read.readwise.io/read/01fexa25wwbbfpa7j5d858ngww)) > Unlike CPU resources, memory cannot be compressed. Because there is no way to throttle memory usage, if a container goes past its memory limit it will be terminated. If your pod is managed by a [Deployment](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/), [StatefulSet](https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/), [DaemonSet](https://kubernetes.io/docs/concepts/workloads/controllers/daemonset/), or another type of controller, then the controller spins up a replacement. ([View Highlight](https://read.readwise.io/read/01fexa2907q1xb6yvks8xpp5sg)) > In an ideal world, Kubernetes’ Container settings would be good enough to take care of everything, but the world is a dark and terrible place. People can easily forget to set the resources, or a rogue team can set the requests and limits very high and take up more than their fair share of the cluster. ([View Highlight](https://read.readwise.io/read/01fexa3z4geekb7ehr6qjcv8k8)) > To prevent these scenarios, you can set up ResourceQuotas and LimitRanges at the Namespace level. ([View Highlight](https://read.readwise.io/read/01fexa405bmrpeggct3j83t5wz)) > ![](https://storage.googleapis.com/gweb-cloudblog-publish/images/gcp-resourcequota3qo9.max-300x300.PNG) ([View Highlight](https://read.readwise.io/read/01fexa4338j1a916pgcn8n5vr6)) > **requests.cpu** is the maximum combined CPU requests in millicores for all the containers in the Namespace. ([View Highlight](https://read.readwise.io/read/01fexa4bp55n67fyh9mpwfq86r)) > **requests.memory** is the maximum combined Memory requests for all the containers in the Namespace. ([View Highlight](https://read.readwise.io/read/01fexa4ffv968q5jq71e0xbrjb)) > **limits.cpu** is the maximum combined CPU limits for all the containers in the Namespace. It’s just like requests.cpu but for the limit. ([View Highlight](https://read.readwise.io/read/01fexa4pngdqd6x6xh8rj00ye8)) > **limits.memory** is the maximum combined Memory limits for all containers in the Namespace. It’s just like requests.memory but for the limit. ([View Highlight](https://read.readwise.io/read/01fexa4r2tgdrtwskh45tcr46b)) > You can also create a LimitRange in your Namespace. Unlike a Quota, which looks at the Namespace as a whole, a LimitRange applies to an individual container. This can help prevent people from creating super tiny or super large containers inside the Namespace. ([View Highlight](https://read.readwise.io/read/01fexa4wycanxpbyjd7v2ye758)) > ![](https://storage.googleapis.com/gweb-cloudblog-publish/images/gcp-limit-range228w.max-400x400.PNG) ([View Highlight](https://read.readwise.io/read/01fexa4y6jzv4x5n6ej3xhzmvw)) > At this point, Kubernetes goes into something called an “[overcommitted state](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/node/resource-qos.md#qos-classes).” Here is where things get interesting. Because CPU can be compressed, Kubernetes will make sure your containers get the CPU they requested and will throttle the rest. Memory cannot be compressed, so Kubernetes needs to start making decisions on what containers to terminate if the Node runs out of memory. ([View Highlight](https://read.readwise.io/read/01fexa6xb7qwbph28v8df81ban)) > Let’s imagine a scenario where we have a machine that is running out of memory. What will Kubernetes do? ([View Highlight](https://read.readwise.io/read/01fexa7bc1t0ybxqssgk7cg501)) > Kubernetes looks for Pods that are using more resources than they requested. If your Pod’s containers have no requests, then by default they are using more than they requested, so these are prime candidates for termination. Other prime candidates are containers that have gone over their request but are still under their limit. ([View Highlight](https://read.readwise.io/read/01fexa79b20gywfz1md4bsr60y)) > If Kubernetes finds multiple pods that have gone over their requests, it will then rank these by the Pod’s [priority](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/scheduling/pod-priority-api.md), and terminate the lowest priority pods first. If all the Pods have the same priority, Kubernetes terminates the Pod that’s the most over its request. ([View Highlight](https://read.readwise.io/read/01fexa7gb8rc531qr7qgbc6mpb)) > While your Kubernetes cluster might work fine without setting resource requests and limits, you will start running into stability issues as your teams and projects grow. Adding requests and limits to your Pods and Namespaces only takes a little extra effort, and can save you from running into many headaches down the line! ([View Highlight](https://read.readwise.io/read/01fexa7qh4e4xfahgqcxad5kdg)) --- Title: Kubernetes Best Practices: Resource Requests and Limits Author: Google Cloud Blog Tags: readwise, articles date: 2024-01-30 --- # Kubernetes Best Practices: Resource Requests and Limits ![rw-book-cover](https://storage.googleapis.com/gweb-cloudblog-publish/images/Google_Blog_Containers_08_Toz0BRc.max-2200x2200.jpg) URL:: https://cloud.google.com/blog/products/containers-kubernetes/kubernetes-best-practices-resource-requests-and-limits Author:: Google Cloud Blog ## AI-Generated Summary While your Kubernetes cluster might work fine without setting resource requests and limits, you will start running into stability issues as your teams and projects grow. Adding requests and limits to your Pods and Namespaces only takes a little extra effort, and can save you from running into many headaches down the line. ## Highlights > Requests and limits are the mechanisms Kubernetes uses to control resources such as CPU and memory. Requests are what the container is guaranteed to get. If a container requests a resource, Kubernetes will only schedule it on a node that can give it that resource. Limits, on the other hand, make sure a container never goes above a certain value. The container is only allowed to go up to the limit, and then it is restricted. ([View Highlight](https://read.readwise.io/read/01fex9y8hxfdrts77wzpzn45ee)) > It is important to remember that the limit can never be lower than the request. If you try this, Kubernetes will throw an error and won’t let you run the container. ([View Highlight](https://read.readwise.io/read/01fex9yme6ypxr9ex7xy8n3wxx)) > CPU resources are defined in millicores. If your container needs two full cores to run, you would put the value “2000m”. If your container only needs ¼ of a core, you would put a value of “250m”. ([View Highlight](https://read.readwise.io/read/01fexa199syxdm5aa2xp9ep6gh)) > One thing to keep in mind about CPU requests is that if you put in a value larger than the core count of your biggest node, your pod will never be scheduled. Let’s say you have a pod that needs four cores, but your Kubernetes cluster is comprised of dual core VMs—your pod will never be scheduled! ([View Highlight](https://read.readwise.io/read/01fexa1exmmb2jb9sbdaxg3eb3)) > Unless your app is specifically designed to take advantage of multiple cores (scientific computing and some databases come to mind), it is usually a best practice to keep the CPU request at ‘1’ or below, and run more replicas to scale it out. This gives the system more flexibility and reliability. ([View Highlight](https://read.readwise.io/read/01fexa1hptwwa38sb2dt430kgb)) > It’s when it comes to CPU limits that things get interesting. CPU is considered a “compressible” resource. If your app starts hitting your CPU limits, Kubernetes starts throttling your container. This means the CPU will be artificially restricted, giving your app potentially worse performance! However, it won’t be terminated or evicted. You can use a [liveness health check](https://cloud.google.com/blog/products/containers-kubernetes/kubernetes-best-practices-setting-up-health-checks-with-readiness-and-liveness-probes) to make sure performance has not been impacted. ([View Highlight](https://read.readwise.io/read/01fexa1rv8c0r59k9gmdrg7ndv)) > Memory resources are defined in bytes. Normally, you give a [mebibyte](https://en.wikipedia.org/wiki/Byte#Multiple-byte_units) value for memory (this is basically the same thing as a megabyte), but you can give anything from [bytes to petabytes](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#meaning-of-memory). ([View Highlight](https://read.readwise.io/read/01fexa20p7jya146vy1fn8thp5)) > Just like CPU, if you put in a memory request that is larger than the amount of memory on your nodes, the pod will never be scheduled. ([View Highlight](https://read.readwise.io/read/01fexa25wwbbfpa7j5d858ngww)) > Unlike CPU resources, memory cannot be compressed. Because there is no way to throttle memory usage, if a container goes past its memory limit it will be terminated. If your pod is managed by a [Deployment](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/), [StatefulSet](https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/), [DaemonSet](https://kubernetes.io/docs/concepts/workloads/controllers/daemonset/), or another type of controller, then the controller spins up a replacement. ([View Highlight](https://read.readwise.io/read/01fexa2907q1xb6yvks8xpp5sg)) > In an ideal world, Kubernetes’ Container settings would be good enough to take care of everything, but the world is a dark and terrible place. People can easily forget to set the resources, or a rogue team can set the requests and limits very high and take up more than their fair share of the cluster. ([View Highlight](https://read.readwise.io/read/01fexa3z4geekb7ehr6qjcv8k8)) > To prevent these scenarios, you can set up ResourceQuotas and LimitRanges at the Namespace level. ([View Highlight](https://read.readwise.io/read/01fexa405bmrpeggct3j83t5wz)) > ![](https://storage.googleapis.com/gweb-cloudblog-publish/images/gcp-resourcequota3qo9.max-300x300.PNG) ([View Highlight](https://read.readwise.io/read/01fexa4338j1a916pgcn8n5vr6)) > **requests.cpu** is the maximum combined CPU requests in millicores for all the containers in the Namespace. ([View Highlight](https://read.readwise.io/read/01fexa4bp55n67fyh9mpwfq86r)) > **requests.memory** is the maximum combined Memory requests for all the containers in the Namespace. ([View Highlight](https://read.readwise.io/read/01fexa4ffv968q5jq71e0xbrjb)) > **limits.cpu** is the maximum combined CPU limits for all the containers in the Namespace. It’s just like requests.cpu but for the limit. ([View Highlight](https://read.readwise.io/read/01fexa4pngdqd6x6xh8rj00ye8)) > **limits.memory** is the maximum combined Memory limits for all containers in the Namespace. It’s just like requests.memory but for the limit. ([View Highlight](https://read.readwise.io/read/01fexa4r2tgdrtwskh45tcr46b)) > You can also create a LimitRange in your Namespace. Unlike a Quota, which looks at the Namespace as a whole, a LimitRange applies to an individual container. This can help prevent people from creating super tiny or super large containers inside the Namespace. ([View Highlight](https://read.readwise.io/read/01fexa4wycanxpbyjd7v2ye758)) > ![](https://storage.googleapis.com/gweb-cloudblog-publish/images/gcp-limit-range228w.max-400x400.PNG) ([View Highlight](https://read.readwise.io/read/01fexa4y6jzv4x5n6ej3xhzmvw)) > At this point, Kubernetes goes into something called an “[overcommitted state](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/node/resource-qos.md#qos-classes).” Here is where things get interesting. Because CPU can be compressed, Kubernetes will make sure your containers get the CPU they requested and will throttle the rest. Memory cannot be compressed, so Kubernetes needs to start making decisions on what containers to terminate if the Node runs out of memory. ([View Highlight](https://read.readwise.io/read/01fexa6xb7qwbph28v8df81ban)) > Let’s imagine a scenario where we have a machine that is running out of memory. What will Kubernetes do? ([View Highlight](https://read.readwise.io/read/01fexa7bc1t0ybxqssgk7cg501)) > Kubernetes looks for Pods that are using more resources than they requested. If your Pod’s containers have no requests, then by default they are using more than they requested, so these are prime candidates for termination. Other prime candidates are containers that have gone over their request but are still under their limit. ([View Highlight](https://read.readwise.io/read/01fexa79b20gywfz1md4bsr60y)) > If Kubernetes finds multiple pods that have gone over their requests, it will then rank these by the Pod’s [priority](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/scheduling/pod-priority-api.md), and terminate the lowest priority pods first. If all the Pods have the same priority, Kubernetes terminates the Pod that’s the most over its request. ([View Highlight](https://read.readwise.io/read/01fexa7gb8rc531qr7qgbc6mpb)) > While your Kubernetes cluster might work fine without setting resource requests and limits, you will start running into stability issues as your teams and projects grow. Adding requests and limits to your Pods and Namespaces only takes a little extra effort, and can save you from running into many headaches down the line! ([View Highlight](https://read.readwise.io/read/01fexa7qh4e4xfahgqcxad5kdg))