---
id: 27e3c297-aea5-4bfa-aeb2-1b2ca75309dc
---

# Kubernetes - Pods not scheduled preventing application scaling.
---

This incident type occurs when the pods in a Kubernetes cluster fail to schedule, which prevents the application from scaling. Kubernetes is a container orchestration platform that automates the deployment, scaling, and management of containerized applications. Pods in Kubernetes are the smallest deployable units that can be created and managed. When pods do not schedule, it means that the Kubernetes scheduler is unable to find a node with enough resources to run the pod. This can cause issues with application performance and scalability.

### Parameters
```shell
export NODE_NAME="PLACEHOLDER"

export POD_NAME="PLACEHOLDER"

export RESOURCE_GROUP_NAME="PLACEHOLDER"

export NODE_TYPE="PLACEHOLDER"

export CLUSTER_NAME="PLACEHOLDER"

export NEW_NODE_COUNT="PLACEHOLDER"

export NODE_COUNT="PLACEHOLDER"
```

## Debug

### Check if pods are pending
```shell
kubectl get pods --field-selector=status.phase=Pending
```

### Check if there are enough resources available to schedule the pod
```shell
kubectl describe node ${NODE_NAME} | grep -i capacity
```

### Check if there are any taints or affinity rules preventing the pod from being scheduled
```shell
kubectl describe node ${NODE_NAME} | grep -i taint

kubectl describe pod ${POD_NAME} | grep -i node-selector
```

### Check if there are any events related to the pod scheduling failure
```shell
kubectl describe pod ${POD_NAME} | grep -i events
```

## Repair

### Check if there are enough resources available on the nodes to schedule the pods. If not, increase the resources on the nodes or provision new nodes by scaling eks cluster, azure cluster and gke cluster as repair
```shell
bash

#!/bin/bash



# Define variables

CLUSTER_NAME=${CLUSTER_NAME}

NODE_COUNT=${NODE_COUNT}

NODE_TYPE=${NODE_TYPE}

NEW_NODE_COUNT=${NEW_NODE_COUNT}



# Check if there are enough resources available on nodes to schedule the pods

if kubectl describe nodes | grep -E "CPU Requests|CPU Limits|Memory Requests|Memory Limits" | awk '{print $2}' | grep -v "\-"; then

  echo "There are enough resources available on the nodes to schedule the pods."

else

  echo "There are not enough resources available on the nodes to schedule the pods."



  # Increase resources on nodes or provision new nodes

  case $CLUSTER_NAME in

    eks)

      echo "Increasing resources on nodes or provisioning new nodes for EKS cluster."

      aws eks update-nodegroup-config --cluster-name $CLUSTER_NAME --nodegroup-name $NODE_TYPE --scaling-config minSize=$NODE_COUNT,maxSize=$NEW_NODE_COUNT,desiredSize=$NEW_NODE_COUNT

      ;;

    azure)

      echo "Increasing resources on nodes or provisioning new nodes for Azure cluster."

      az aks nodepool update --name $NODE_TYPE --cluster-name $CLUSTER_NAME --resource-group ${RESOURCE_GROUP_NAME} --node-count $NEW_NODE_COUNT

      ;;

    gke)

      echo "Increasing resources on nodes or provisioning new nodes for GKE cluster."

      gcloud container clusters resize $CLUSTER_NAME --node-pool $NODE_TYPE --size=$NEW_NODE_COUNT

      ;;

    *)

      echo "Invalid cluster name specified."

      exit 1

      ;;

  esac

fi


```


This incident type occurs when the pods in a Kubernetes cluster fail to schedule, which prevents the application from scaling. Kubernetes is a container orchestration platform that automates the deployment, scaling, and management of containerized applications. Pods in Kubernetes are the smallest deployable units that can be created and managed. When pods do not schedule, it means that the Kubernetes scheduler is unable to find a node with enough resources to run the pod. This can cause issues with application performance and scalability.


The Vault cluster health incident is related to the health of a Vault cluster instance. This incident type is triggered when the cluster instance is not healthy and requires attention to ensure it is functioning properly. The incident typically involves evaluating the current state of the cluster instance, diagnosing the issue, and taking corrective action to restore the health of the instance.


Vault cluster health incident on kubernetes

This incident type involves an issue with Kubernetes deployments where the expected number of pods to run is not matching the actual number of pods running. This can lead to alerts being triggered and potential disruptions in the system.


Kubernetes pods not starting - Deployment issue

The incident type of "Kubernetes deployment with multiple restarts" indicates that a Kubernetes deployment has experienced multiple restarts within a certain timeframe, which is usually indicative of a problem. Kubernetes is a popular container orchestration platform that automates the deployment, scaling, and management of containerized applications. When a deployment experiences multiple restarts, it can impact the availability and performance of the application, and can be a sign of underlying issues that need to be addressed. This incident type is typically monitored and managed by DevOps teams responsible for ensuring the health and reliability of Kubernetes-based applications.


Kubernetes deployment with multiple restarts

This incident type involves monitoring the replicas of a Kubernetes Statefulset, which is a type of workload in Kubernetes used for stateful applications. The incident is triggered when more than one replica's pods are down, creating an unsafe situation for manual operations. This incident is critical and requires immediate attention to resolve the issue and ensure the smooth functioning of the stateful applications.


Kubernetes Statefulset Replicas Monitoring Incident

A Kubernetes Replicaset Incomplete incident typically occurs when a specific number of pods that should be running are not, due to reasons such as failed pod initialization, unavailability of resources in the cluster, or inability to pull the image. This incident is usually triggered when the difference between desired and running pods is greater than zero, and it can be detected through monitoring tools like Datadog.


Kubernetes Replicaset Incomplete

```shell
export NODE_NAME="PLACEHOLDER"

export POD_NAME="PLACEHOLDER"

export RESOURCE_GROUP_NAME="PLACEHOLDER"

export NODE_TYPE="PLACEHOLDER"

export CLUSTER_NAME="PLACEHOLDER"

export NEW_NODE_COUNT="PLACEHOLDER"

export NODE_COUNT="PLACEHOLDER"
```


### Check if pods are pending

```shell
kubectl get pods --field-selector=status.phase=Pending
```

### Check if there are enough resources available to schedule the pod

```shell
kubectl describe node ${NODE_NAME} | grep -i capacity
```

### Check if there are any taints or affinity rules preventing the pod from being scheduled

```shell
kubectl describe node ${NODE_NAME} | grep -i taint

kubectl describe pod ${POD_NAME} | grep -i node-selector
```

### Check if there are any events related to the pod scheduling failure

```shell
kubectl describe pod ${POD_NAME} | grep -i events
```


### Check if there are enough resources available on the nodes to schedule the pods. If not, increase the resources on the nodes or provision new nodes by scaling eks cluster, azure cluster and gke cluster as repair

```shell
bash

#!/bin/bash



# Define variables

CLUSTER_NAME=${CLUSTER_NAME}

NODE_COUNT=${NODE_COUNT}

NODE_TYPE=${NODE_TYPE}

NEW_NODE_COUNT=${NEW_NODE_COUNT}



# Check if there are enough resources available on nodes to schedule the pods

if kubectl describe nodes | grep -E "CPU Requests|CPU Limits|Memory Requests|Memory Limits" | awk '{print $2}' | grep -v "\-"; then

  echo "There are enough resources available on the nodes to schedule the pods."

else

  echo "There are not enough resources available on the nodes to schedule the pods."



  # Increase resources on nodes or provision new nodes

  case $CLUSTER_NAME in

    eks)

      echo "Increasing resources on nodes or provisioning new nodes for EKS cluster."

      aws eks update-nodegroup-config --cluster-name $CLUSTER_NAME --nodegroup-name $NODE_TYPE --scaling-config minSize=$NODE_COUNT,maxSize=$NEW_NODE_COUNT,desiredSize=$NEW_NODE_COUNT

      ;;

    azure)

      echo "Increasing resources on nodes or provisioning new nodes for Azure cluster."

      az aks nodepool update --name $NODE_TYPE --cluster-name $CLUSTER_NAME --resource-group ${RESOURCE_GROUP_NAME} --node-count $NEW_NODE_COUNT

      ;;

    gke)

      echo "Increasing resources on nodes or provisioning new nodes for GKE cluster."

      gcloud container clusters resize $CLUSTER_NAME --node-pool $NODE_TYPE --size=$NEW_NODE_COUNT

      ;;

    *)

      echo "Invalid cluster name specified."

      exit 1

      ;;

  esac

fi


```


Kubernetes - Pods not scheduled preventing application scaling.

Overview

Parameters

Debug

Check if pods are pending

Check if there are enough resources available to schedule the pod

Check if there are any taints or affinity rules preventing the pod from being scheduled

Repair

Check if there are enough resources available on the nodes to schedule the pods. If not, increase the resources on the nodes or provision new nodes by scaling eks cluster, azure cluster and gke cluster as repair

Learn more

Related Runbooks

Vault cluster health incident on kubernetes

Kubernetes pods not starting - Deployment issue

Kubernetes deployment with multiple restarts

Kubernetes Statefulset Replicas Monitoring Incident

Support