Node Not Ready in Kubernetes Cluster is an incident type that occurs when a node in a Kubernetes cluster fails to respond, is unresponsive, or is not ready to take on workloads. This can cause disruptions in service and lead to downtime, as the cluster is unable to allocate resources effectively. This incident type can be caused by a range of factors, including hardware issues, network problems, and configuration errors. Swift resolution of this incident is essential to ensure that the Kubernetes cluster is able to function correctly and provide uninterrupted service.
Parameters
Debug
Get the status of all nodes in the Kubernetes cluster
Check the events for a specific node to see if there are any error messages
Check the status of the kubelet process on the node
Check the logs for the kubelet process on the node
Check the status of the container runtime on the node
Check the logs for the container runtime on the node
Check the status of the Kubernetes control plane components
Check the status of the Kubernetes scheduler
Check the status of the Kubernetes controller manager
Insufficient resources on the Kubernetes node, causing it to become unresponsive.
Repair
Restart the kubelet service on the node to make sure it's properly connected to the control plane.
Learn more
Related Runbooks
Check out these related runbooks to help you debug and resolve similar issues.