Implementing Kubernetes Resource Limits and Quotas

Michael Levan
4 min readOct 20, 2022

Although there are several areas of tech that are abstracted away from us and most pieces are software-defined when it comes to Kubernetes, there’s still an underlying infrastructure. Whether that’s an infrastructure that you’re managing or a Kubernetes cloud-based service like AKS, EKS, or GKE. Servers still exist and they’re what’s running the Kubernetes platform.

Because of that, resource consumption for things like CPU, memory, and storage very much exists and needs to be accounted for.

In this blog post, you’ll learn about implementing resource limits, why it’s important, and the need for rate limits.

Why Resource Limits and Quotas

When you’re working with a team, there could be:

  • multi-tenancy or single-tenancy when it comes to Kubernetes clusters
  • The servers could be extremely large or extremely small
  • There could be 10 apps or 1,000 apps running on the Kubernetes cluster
  • Fixed nodes or auto-scaling nodes

Thinking about the above, there’s a lot that can go wrong with a cluster that runs out of resources (CPU, memory, storage, etc.). Pods can crash, users can be booted off of the application, servers can shut down and go into a panic, alerts triggered, and worst of all, a long night for the DevOps team.

The goal with resource limits and quotas is that you want each application to have its “fair share” of resources. No more than it needs and no less.

Resource Limits for Pods

When you attempt to create a Pod, the kube-scheduler looks at what worker nodes on the Kubernetes cluster have enough resources to host the Pod(s). Once the kube-scheduler finds a suitable node, the Pod gets deployed. When you enforce rate limits, the kubelet enforces the rate limits and doesn’t allow your Pod(s) to go above the limit.

Below is an example Kubernetes Manifest for resource limits. Notice how this is a standard Manifest, but there’s a resources map that contains requests and limits around memory and CPU. requests are for literally requesting those resources for the Pod, and limits are telling the Pod “you can’t go above this amount of memory and CPU”.

apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
spec:
selector:
matchLabels:
app: nginxdeployment
replicas: 2
template:
metadata:
labels:
app: nginxdeployment
spec:
containers:
- name: nginxdeployment
image: nginx:latest
resources:
requests:
memory: "64Mi"
cpu: "250m"
limits:
memory: "128Mi"
cpu: "500m"
ports:
- containerPort: 80

When setting up resource limits for Pods, remember to take into account horizontal and vertical auto-scaling. Horizontal auto-scaling is for creating more of the same Pod. Vertical auto-scaling is for increasing resources (memory, CPU, etc.) on the Pod. Both of these are important to keep in mind when architecting the way auto-scaling will work for your Kubernetes environment as you don’t want to be blocked by resource limits.

Applying Quotas at the Namespace Level

As you learned in the previous section, you can set up resource limits for Pods. Although that can be practical with a few applications, it doesn’t really scale well when you have multiple applications to manage. Because of that, setting quotas at the Namespace level may make a bit more sense for organizations. You do this by utilizing the ResourceQuota object/spec in a Kubernetes Manifest.

Common scenarios for applying quotas at the namespace level with the ResourceQuota object/spec are:

  • Control the amount of compute resources that are available to a specific namespace
  • Limit the number of API objects that can be created within a namespace, like Pods and Services
  • Control the limit of LoadBalancer objects that can be created in the cloud for a Kubernetes Service as load balancers in the cloud can get expensive

Applying quotas is great for protecting your environment as well. If you have an application that has some sort of memory leak, that application would just keep attempting to consume RAM until the system crashes. With quotas, you don’t have to worry about that and you can get ahead of the problem much faster.

Below is an example Kubernetes Manifest that you can use to create a ResourceQuota in a Namespace. The Namespace in this example is called golangapp.

apiVersion: v1
kind: ResourceQuota
metadata:
name: memorylimit
namespace: golangapp
spec:
hard:
requests.memory: 512Mi
limits.memory: 512Mi

When setting up Resource Quotas in a Namespace, remember that there should be a fair amount of architecture design and decision-making around the process because Pods won’t deploy if the Quota has been met. For example, if you limit the memory to 512Mi, but you try to give a Pod 2GB of memory, there will be an error that looks similar to the below.

Error from server (Forbidden):
error when creating "name_of_manifest.yaml": pods
"app name" is forbidden:
exceeded quota: memorylimit

The Need For CPU Rate Limits

In the last section of this blog post, I wanted to bring up a controversial topic; CPU rate limits.

There are two primary reasons to think about CPU rate limits.

The first is for benchmarks. If you’re benchmarking an application, the resources are going to sky-rocket. Because of that, you want to ensure that you don’t give your Pods unlimited resources.

The second is for language-specific needs. For example, if you’re using an older version of Java (somewhere around version 8u131 or below), JVM is unaware of the limit of resources and attempts to more consume memory and CPU than what’s actually needed.

There definitely isn’t a huge use-case for CPU rate limits. However, there are a few that do exist and you should be prepared to handle them if the need arises.

--

--

Michael Levan

Leader in Kubernetes consulting, research, and content creation ┇AWS Community Builder (Dev Tools Category)┇ HashiCorp Ambassador