---
id: fcadf10f-39f4-4942-8309-0ebdc4fe0cbb
---

# Node Not Ready in Kubernetes Cluster
---

Node Not Ready in Kubernetes Cluster is an incident type that occurs when a node in a Kubernetes cluster fails to respond, is unresponsive, or is not ready to take on workloads. This can cause disruptions in service and lead to downtime, as the cluster is unable to allocate resources effectively. This incident type can be caused by a range of factors, including hardware issues, network problems, and configuration errors. Swift resolution of this incident is essential to ensure that the Kubernetes cluster is able to function correctly and provide uninterrupted service.

### Parameters
```shell
# Environment Variables

export NODE_NAME="PLACEHOLDER"

export USERNAME="PLACEHOLDER"

```

## Debug

### Get the status of all nodes in the Kubernetes cluster
```shell
kubectl get nodes
```

### Check the events for a specific node to see if there are any error messages
```shell
kubectl describe node ${NODE_NAME}
```

### Check the status of the kubelet process on the node
```shell
sudo systemctl status kubelet
```

### Check the logs for the kubelet process on the node
```shell
sudo journalctl -u kubelet
```

### Check the status of the container runtime on the node
```shell
sudo systemctl status docker 
```

### Check the logs for the container runtime on the node
```shell
sudo journalctl -u docker
```

### Check the status of the Kubernetes control plane components
```shell
kubectl get componentstatuses
```

### Check the status of the Kubernetes scheduler
```shell
kubectl get pods -n kube-system -l component=kube-scheduler
```

### Check the status of the Kubernetes controller manager
```shell
kubectl get pods -n kube-system -l component=kube-controller-manager
```

### Insufficient resources on the Kubernetes node, causing it to become unresponsive.
```shell
bash

#!/bin/bash

# Set the node name

NODE_NAME=${NODE_NAME}

# Get the node's current resource usage

RESOURCE_USAGE=$(kubectl top node $NODE_NAME)

# Get the node's capacity

CAPACITY=$(kubectl describe node $NODE_NAME | grep -E "Allocatable")

# Check if the node is using more resources than it has available

if [[ $RESOURCE_USAGE > $CAPACITY ]]; then

  echo "Node $NODE_NAME is using more resources than it has available."

  echo "Resource usage: $RESOURCE_USAGE"

  echo "Capacity: $CAPACITY"

else

  echo "Node $NODE_NAME has sufficient resources."

fi

```

## Repair

### Restart the kubelet service on the node to make sure it's properly connected to the control plane.
```shell
bash

#!/bin/bash

# Set the node name

NODE_NAME=${NODE_NAME}

# Restart the kubelet service

ssh ${USERNAME}@$NODE_NAME "systemctl restart kubelet"

```


Node Not Ready in Kubernetes Cluster is an incident type that occurs when a node in a Kubernetes cluster fails to respond, is unresponsive, or is not ready to take on workloads. This can cause disruptions in service and lead to downtime, as the cluster is unable to allocate resources effectively. This incident type can be caused by a range of factors, including hardware issues, network problems, and configuration errors. Swift resolution of this incident is essential to ensure that the Kubernetes cluster is able to function correctly and provide uninterrupted service.


This incident type involves monitoring the replicas of a Kubernetes Statefulset, which is a type of workload in Kubernetes used for stateful applications. The incident is triggered when more than one replica's pods are down, creating an unsafe situation for manual operations. This incident is critical and requires immediate attention to resolve the issue and ensure the smooth functioning of the stateful applications.


Kubernetes Statefulset Replicas Monitoring Incident

A Kubernetes Replicaset Incomplete incident typically occurs when a specific number of pods that should be running are not, due to reasons such as failed pod initialization, unavailability of resources in the cluster, or inability to pull the image. This incident is usually triggered when the difference between desired and running pods is greater than zero, and it can be detected through monitoring tools like Datadog.


Kubernetes Replicaset Incomplete

Kubernetes Pods Pending incident indicates that one or more pods in a Kubernetes cluster are not running as expected and are in a pending state. This can happen due to various reasons such as resource constraints, scheduling issues, or network problems. This incident can impact the availability and performance of the application running on the Kubernetes cluster. It requires immediate attention to diagnose and resolve the underlying issue to ensure the pods are running as expected.


Kubernetes Pods Pending

This incident type involves nodes in a Kubernetes cluster that are experiencing network unavailability, meaning they are not accessible. This could be due to a misconfiguration, route exhaustion, or a physical problem with the network connection to the hardware. It is a high urgency incident that requires immediate attention to restore network connectivity to the affected nodes.


Kubernetes Nodes with Network Unavailable

The Kubernetes Nodes with Memorypressure incident type occurs when a Kubernetes cluster node is running low on memory, which can be caused by a memory leak in an application. This incident type requires immediate attention to prevent any downtime and ensure the proper functioning of the Kubernetes cluster. Typically, this incident type is monitored by DevOps teams using various monitoring tools, including PagerDuty, to identify and address memory pressure issues quickly.


Kubernetes Nodes with Memorypressure incident

```shell
# Environment Variables

export NODE_NAME="PLACEHOLDER"

export USERNAME="PLACEHOLDER"

```


### Get the status of all nodes in the Kubernetes cluster

```shell
kubectl get nodes
```

### Check the events for a specific node to see if there are any error messages

```shell
kubectl describe node ${NODE_NAME}
```

### Check the status of the kubelet process on the node

```shell
sudo systemctl status kubelet
```

### Check the logs for the kubelet process on the node

```shell
sudo journalctl -u kubelet
```

### Check the status of the container runtime on the node

```shell
sudo systemctl status docker 
```

### Check the logs for the container runtime on the node

```shell
sudo journalctl -u docker
```

### Check the status of the Kubernetes control plane components

```shell
kubectl get componentstatuses
```

### Check the status of the Kubernetes scheduler

```shell
kubectl get pods -n kube-system -l component=kube-scheduler
```

### Check the status of the Kubernetes controller manager

```shell
kubectl get pods -n kube-system -l component=kube-controller-manager
```

### Insufficient resources on the Kubernetes node, causing it to become unresponsive.

```shell
bash

#!/bin/bash

# Set the node name

NODE_NAME=${NODE_NAME}

# Get the node's current resource usage

RESOURCE_USAGE=$(kubectl top node $NODE_NAME)

# Get the node's capacity

CAPACITY=$(kubectl describe node $NODE_NAME | grep -E "Allocatable")

# Check if the node is using more resources than it has available

if [[ $RESOURCE_USAGE > $CAPACITY ]]; then

  echo "Node $NODE_NAME is using more resources than it has available."

  echo "Resource usage: $RESOURCE_USAGE"

  echo "Capacity: $CAPACITY"

else

  echo "Node $NODE_NAME has sufficient resources."

fi

```


### Restart the kubelet service on the node to make sure it's properly connected to the control plane.

```shell
bash

#!/bin/bash

# Set the node name

NODE_NAME=${NODE_NAME}

# Restart the kubelet service

ssh ${USERNAME}@$NODE_NAME "systemctl restart kubelet"

```


Node Not Ready in Kubernetes Cluster

Overview

Parameters

Debug

Get the status of all nodes in the Kubernetes cluster

Check the events for a specific node to see if there are any error messages

Check the status of the kubelet process on the node

Check the logs for the kubelet process on the node

Check the status of the container runtime on the node

Check the logs for the container runtime on the node

Check the status of the Kubernetes control plane components

Check the status of the Kubernetes scheduler

Check the status of the Kubernetes controller manager

Insufficient resources on the Kubernetes node, causing it to become unresponsive.

Repair

Restart the kubelet service on the node to make sure it's properly connected to the control plane.

Learn more

Related Runbooks

Kubernetes Statefulset Replicas Monitoring Incident

Kubernetes Replicaset Incomplete

Kubernetes Pods Pending

Kubernetes Nodes with Network Unavailable

Support