This incident type relates to a situation where the available CPU for limits in percentages in a Kubernetes cluster is low. The incident is triggered when a container uses more CPU resources than its specified request and limits, eventually leading to resource exhaustion. This can cause service disruptions and impact the performance of the Kubernetes cluster. The incident requires immediate attention to prevent further degradation of service quality.
Parameters
Debug
Check the Kubernetes cluster status
Check the status of the Kubernetes nodes
Check the status of the Kubernetes pods
Check the CPU resources for the Kubernetes nodes
Check the CPU resources for the Kubernetes pods
Check the resource limits for the Kubernetes pods
Check the CPU usage for the Kubernetes pods
Check the resource requests for the Kubernetes pods
Check the CPU usage of the Kubernetes nodes
Heavy load on the Kubernetes cluster which has caused the CPU usage to spike and exceed the limits set in place.
Misconfiguration of Kubernetes resources such as incorrect CPU limit values or not enough resources allocated to the cluster.
Repair
Increase the CPU limits for the affected Kubernetes Pods: Insufficient CPU limits may cause this incident. Increasing the limits can help address the issue.
Add more CPU resources to the Kubernetes cluster: If the cluster is already running at maximum capacity and the CPU limits are set appropriately, you may need to add more CPU resources to the cluster to avoid this incident.
Learn more
Related Runbooks
Check out these related runbooks to help you debug and resolve similar issues.