Runbook

Node Not Ready in Kubernetes Cluster

Back to Runbooks

Overview

Node Not Ready in Kubernetes Cluster is an incident type that occurs when a node in a Kubernetes cluster fails to respond, is unresponsive, or is not ready to take on workloads. This can cause disruptions in service and lead to downtime, as the cluster is unable to allocate resources effectively. This incident type can be caused by a range of factors, including hardware issues, network problems, and configuration errors. Swift resolution of this incident is essential to ensure that the Kubernetes cluster is able to function correctly and provide uninterrupted service.

Parameters

Debug

Get the status of all nodes in the Kubernetes cluster

Check the events for a specific node to see if there are any error messages

Check the status of the kubelet process on the node

Check the logs for the kubelet process on the node

Check the status of the container runtime on the node

Check the logs for the container runtime on the node

Check the status of the Kubernetes control plane components

Check the status of the Kubernetes scheduler

Check the status of the Kubernetes controller manager

Insufficient resources on the Kubernetes node, causing it to become unresponsive.

Repair

Restart the kubelet service on the node to make sure it's properly connected to the control plane.

Learn more

Related Runbooks

Check out these related runbooks to help you debug and resolve similar issues.