Runbook

Kubernetes Node Status Not OK

Back to Runbooks

Overview

This incident type occurs when the Kubernetes node status is not OK. It means that the scheduler cannot place pods on the node due to some underlying issue with the node's health. This incident can impact the availability and performance of the applications running on the Kubernetes cluster. Immediate attention is required to resolve this incident to restore the normal functioning of the Kubernetes cluster.

Parameters

Debug

List all nodes in the Kubernetes cluster

Check the status of a specific node <node-name>

Check the events associated with a specific node <node-name>

Check the health status of the kubelet service on the node <node-name>

Check the logs for the kubelet service on the node <node-name>

Check the status of the Docker service on the node <node-name>

Check the logs for the Docker service on the node <node-name>

Network or connectivity issues between the Kubernetes nodes and the control plane.

Resource constraints on the node due to excessive resource utilization by the applications running on it.

Repair

Check the health of the affected Kubernetes node. Identify and fix any underlying issues with the node, such as hardware failure or resource exhaustion.

Learn more

Related Runbooks

Check out these related runbooks to help you debug and resolve similar issues.