---
id: 288214fd-c805-476f-b455-1e9f3a5a4e46
---

# Kubernetes Deployments Replica Pods Monitoring Incident
---

This incident type relates to the monitoring of Kubernetes deployments replica pods. It implies that there is an issue with the number of replica pods available as compared to the desired number. The incident might be triggered by a query alert monitor and might require immediate action to resolve the issue. The incident could impact the deployment of applications hosted on Kubernetes and might require troubleshooting and fixing the underlying issue.

### Parameters
```shell
# Environment Variables
export NAMESPACE="PLACEHOLDER"
export POD_NAME="PLACEHOLDER"
export DEPLOYMENT_NAME="PLACEHOLDER"
export PATH_TO_KUBE_MANIFESTS="PLACEHOLDER"
export CONTEXT_NAME="PLACEHOLDER"
```

## Debug

### List all deployments in the affected namespace
```shell
kubectl get deployments -n ${NAMESPACE}
```

### Check if there are any pods that are not ready
```shell
kubectl describe pods -n ${NAMESPACE}
```

### Check the logs of the affected pods for any errors
```shell
kubectl logs ${POD_NAME} -n ${NAMESPACE}
```

### Check the events in the affected namespace
```shell
kubectl get events -n ${NAMESPACE}
```

### Check the status of the Kubernetes nodes
```shell
kubectl get nodes
```

### Check the status of the Kubernetes services
```shell
kubectl get services -n ${NAMESPACE}
```

### Check the Kubernetes events related to the deployment
```shell
kubectl describe deployment ${DEPLOYMENT_NAME} -n ${NAMESPACE}
```

### A recent deployment or upgrade of applications on Kubernetes might have caused the pods to go down.
```shell
bash
#!/bin/bash

# Set the target Kubernetes namespace and deployment name
NAMESPACE=${NAMESPACE}
DEPLOYMENT=${DEPLOYMENT_NAME}

# Check the status of the deployment
DEPLOYMENT_STATUS=$(kubectl -n $NAMESPACE get deployment $DEPLOYMENT -o jsonpath='{.status.conditions[?(@.type=="Available")].status}')

if [ "$DEPLOYMENT_STATUS" == "False" ]; then
  # Get the deployment events
  kubectl -n $NAMESPACE describe deployment $DEPLOYMENT | grep -A 10 -i events

  # Check the logs of the deployment pods
  PODS=$(kubectl -n $NAMESPACE get pods -l app=$DEPLOYMENT -o jsonpath='{.items[*].metadata.name}')
  for POD in $PODS; do
    echo "### Logs for pod $POD ###"
    kubectl -n $NAMESPACE logs $POD
  done
fi

```

### There might be a scaling issue where the desired number of replica pods is not being met due to resource constraints or misconfiguration.
```shell

#!/bin/bash

# Set Kubernetes context
kubectl config use-context ${CONTEXT_NAME}

# Check if the deployment exists
if ! kubectl get deployment ${DEPLOYMENT_NAME}; then
  echo "Deployment does not exist"
  exit 1
fi

# Check the number of available replicas
available=$(kubectl get deployment ${DEPLOYMENT_NAME} -o=jsonpath='{.status.availableReplicas}')
if [[ $available -eq 0 ]]; then
  echo "No available replicas"
  exit 1
fi

# Check the number of desired replicas
desired=$(kubectl get deployment ${DEPLOYMENT_NAME} -o=jsonpath='{.spec.replicas}')
if [[ $desired -eq 0 ]]; then
  echo "No desired replicas"
  exit 1
fi

# Check if the number of available replicas equals the number of desired replicas
if [[ $available -ne $desired ]]; then
  echo "Scaling issue - available replicas do not match desired replicas"
  exit 1
fi

echo "Scaling is working correctly"

```

---

## Repair
---
### Check if any recent changes were made to the deployment that could have caused the issue. Verify if the replicas are scaled down or if there is a problem with the deployment configuration.
```shell

#!/bin/bash

# Check if deployment has been modified recently
last_modified=$(kubectl get deployment ${DEPLOYMENT_NAME} -o=json | jq -r '.metadata.creationTimestamp')
echo "Deployment was last modified on: $last_modified"

# Check if replicas are scaled down
current_replicas=$(kubectl get deployment ${DEPLOYMENT_NAME} -o=json | jq -r '.spec.replicas')
available_replicas=$(kubectl get deployment ${DEPLOYMENT_NAME} -o=json | jq -r '.status.availableReplicas')
if [[ $available_replicas -lt $current_replicas ]]; then
    echo "Replicas are scaled down. Current replicas: $current_replicas, Available replicas: $available_replicas"
fi

# Verify if there is a problem with the deployment configuration
deployment_status=$(kubectl get deployment ${DEPLOYMENT_NAME} -o=json | jq -r '.status.conditions[-1].status')
if [[ $deployment_status == "False" ]]; then
    deployment_message=$(kubectl get deployment ${DEPLOYMENT_NAME} -o=json | jq -r '.status.conditions[-1].message')
    echo "There is a problem with the deployment configuration: $deployment_message"
fi

```
### Perform a Rolling Restart of a Kubernetes Deployment.
.
```shell
#!/bin/bash

kubectl rollout restart deployment/${DEPLOYMENT_NAME} -n ${NAMESPACE}
 
```

---


This incident type relates to the monitoring of Kubernetes deployments replica pods. It implies that there is an issue with the number of replica pods available as compared to the desired number. The incident might be triggered by a query alert monitor and might require immediate action to resolve the issue. The incident could impact the deployment of applications hosted on Kubernetes and might require troubleshooting and fixing the underlying issue.


The Vault cluster health incident is related to the health of a Vault cluster instance. This incident type is triggered when the cluster instance is not healthy and requires attention to ensure it is functioning properly. The incident typically involves evaluating the current state of the cluster instance, diagnosing the issue, and taking corrective action to restore the health of the instance.


Vault cluster health incident on kubernetes

The Vault Sealed incident type occurs when an instance of the Vault has been sealed due to an error or issue. This can cause disruption to the system and requires immediate attention to resolve the issue and unseal the Vault instance. It is important to identify the root cause of the issue in order to prevent it from happening again in the future.


Vault Sealed Incident

This incident type involves an issue with Kubernetes deployments where the expected number of pods to run is not matching the actual number of pods running. This can lead to alerts being triggered and potential disruptions in the system.


Kubernetes pods not starting - Deployment issue

The incident type of "Kubernetes deployment with multiple restarts" indicates that a Kubernetes deployment has experienced multiple restarts within a certain timeframe, which is usually indicative of a problem. Kubernetes is a popular container orchestration platform that automates the deployment, scaling, and management of containerized applications. When a deployment experiences multiple restarts, it can impact the availability and performance of the application, and can be a sign of underlying issues that need to be addressed. This incident type is typically monitored and managed by DevOps teams responsible for ensuring the health and reliability of Kubernetes-based applications.


Kubernetes deployment with multiple restarts

This incident type involves monitoring the replicas of a Kubernetes Statefulset, which is a type of workload in Kubernetes used for stateful applications. The incident is triggered when more than one replica's pods are down, creating an unsafe situation for manual operations. This incident is critical and requires immediate attention to resolve the issue and ensure the smooth functioning of the stateful applications.


Kubernetes Statefulset Replicas Monitoring Incident

```shell
# Environment Variables
export NAMESPACE="PLACEHOLDER"
export POD_NAME="PLACEHOLDER"
export DEPLOYMENT_NAME="PLACEHOLDER"
export PATH_TO_KUBE_MANIFESTS="PLACEHOLDER"
export CONTEXT_NAME="PLACEHOLDER"
```


### List all deployments in the affected namespace

```shell
kubectl get deployments -n ${NAMESPACE}
```

### Check if there are any pods that are not ready

```shell
kubectl describe pods -n ${NAMESPACE}
```

### Check the logs of the affected pods for any errors

```shell
kubectl logs ${POD_NAME} -n ${NAMESPACE}
```

### Check the events in the affected namespace

```shell
kubectl get events -n ${NAMESPACE}
```

### Check the status of the Kubernetes nodes

```shell
kubectl get nodes
```

### Check the status of the Kubernetes services

```shell
kubectl get services -n ${NAMESPACE}
```

### Check the Kubernetes events related to the deployment

```shell
kubectl describe deployment ${DEPLOYMENT_NAME} -n ${NAMESPACE}
```

### A recent deployment or upgrade of applications on Kubernetes might have caused the pods to go down.

```shell
bash
#!/bin/bash

# Set the target Kubernetes namespace and deployment name
NAMESPACE=${NAMESPACE}
DEPLOYMENT=${DEPLOYMENT_NAME}

# Check the status of the deployment
DEPLOYMENT_STATUS=$(kubectl -n $NAMESPACE get deployment $DEPLOYMENT -o jsonpath='{.status.conditions[?(@.type=="Available")].status}')

if [ "$DEPLOYMENT_STATUS" == "False" ]; then
  # Get the deployment events
  kubectl -n $NAMESPACE describe deployment $DEPLOYMENT | grep -A 10 -i events

  # Check the logs of the deployment pods
  PODS=$(kubectl -n $NAMESPACE get pods -l app=$DEPLOYMENT -o jsonpath='{.items[*].metadata.name}')
  for POD in $PODS; do
    echo "### Logs for pod $POD ###"
    kubectl -n $NAMESPACE logs $POD
  done
fi

```

### There might be a scaling issue where the desired number of replica pods is not being met due to resource constraints or misconfiguration.

```shell

#!/bin/bash

# Set Kubernetes context
kubectl config use-context ${CONTEXT_NAME}

# Check if the deployment exists
if ! kubectl get deployment ${DEPLOYMENT_NAME}; then
  echo "Deployment does not exist"
  exit 1
fi

# Check the number of available replicas
available=$(kubectl get deployment ${DEPLOYMENT_NAME} -o=jsonpath='{.status.availableReplicas}')
if [[ $available -eq 0 ]]; then
  echo "No available replicas"
  exit 1
fi

# Check the number of desired replicas
desired=$(kubectl get deployment ${DEPLOYMENT_NAME} -o=jsonpath='{.spec.replicas}')
if [[ $desired -eq 0 ]]; then
  echo "No desired replicas"
  exit 1
fi

# Check if the number of available replicas equals the number of desired replicas
if [[ $available -ne $desired ]]; then
  echo "Scaling issue - available replicas do not match desired replicas"
  exit 1
fi

echo "Scaling is working correctly"

```


### Check if any recent changes were made to the deployment that could have caused the issue. Verify if the replicas are scaled down or if there is a problem with the deployment configuration.

```shell

#!/bin/bash

# Check if deployment has been modified recently
last_modified=$(kubectl get deployment ${DEPLOYMENT_NAME} -o=json | jq -r '.metadata.creationTimestamp')
echo "Deployment was last modified on: $last_modified"

# Check if replicas are scaled down
current_replicas=$(kubectl get deployment ${DEPLOYMENT_NAME} -o=json | jq -r '.spec.replicas')
available_replicas=$(kubectl get deployment ${DEPLOYMENT_NAME} -o=json | jq -r '.status.availableReplicas')
if [[ $available_replicas -lt $current_replicas ]]; then
    echo "Replicas are scaled down. Current replicas: $current_replicas, Available replicas: $available_replicas"
fi

# Verify if there is a problem with the deployment configuration
deployment_status=$(kubectl get deployment ${DEPLOYMENT_NAME} -o=json | jq -r '.status.conditions[-1].status')
if [[ $deployment_status == "False" ]]; then
    deployment_message=$(kubectl get deployment ${DEPLOYMENT_NAME} -o=json | jq -r '.status.conditions[-1].message')
    echo "There is a problem with the deployment configuration: $deployment_message"
fi

```

### Perform a Rolling Restart of a Kubernetes Deployment.

.

```shell
#!/bin/bash

kubectl rollout restart deployment/${DEPLOYMENT_NAME} -n ${NAMESPACE}
 
```


Kubernetes Deployments Replica Pods Monitoring Incident

Overview

Parameters

Debug

List all deployments in the affected namespace

Check if there are any pods that are not ready

Check the logs of the affected pods for any errors

Check the events in the affected namespace

Check the status of the Kubernetes nodes

Check the status of the Kubernetes services

A recent deployment or upgrade of applications on Kubernetes might have caused the pods to go down.

There might be a scaling issue where the desired number of replica pods is not being met due to resource constraints or misconfiguration.

Repair

Check if any recent changes were made to the deployment that could have caused the issue. Verify if the replicas are scaled down or if there is a problem with the deployment configuration.

Perform a Rolling Restart of a Kubernetes Deployment.

Learn more

Related Runbooks

Vault cluster health incident on kubernetes

Vault Sealed Incident

Kubernetes pods not starting - Deployment issue

Kubernetes deployment with multiple restarts

Support