This incident type occurs when the Kubernetes node status is not OK. It means that the scheduler cannot place pods on the node due to some underlying issue with the node's health. This incident can impact the availability and performance of the applications running on the Kubernetes cluster. Immediate attention is required to resolve this incident to restore the normal functioning of the Kubernetes cluster.
Parameters
Debug
List all nodes in the Kubernetes cluster
Check the status of a specific node <node-name>
Check the events associated with a specific node <node-name>
Check the health status of the kubelet service on the node <node-name>
Check the logs for the kubelet service on the node <node-name>
Check the status of the Docker service on the node <node-name>
Check the logs for the Docker service on the node <node-name>
Network or connectivity issues between the Kubernetes nodes and the control plane.
Resource constraints on the node due to excessive resource utilization by the applications running on it.
Repair
Check the health of the affected Kubernetes node. Identify and fix any underlying issues with the node, such as hardware failure or resource exhaustion.
Learn more
Related Runbooks
Check out these related runbooks to help you debug and resolve similar issues.