---
id: 1c2a8ec3-1239-4f2d-98dc-5479c97443d0
---

# Kubernetes Pod Restarting Monitoring
---

A Kubernetes Pod Restarting Monitoring incident is triggered when a pod running on a Kubernetes cluster restarts multiple times within a certain time frame. This incident type is usually used to detect issues with the application or infrastructure running on the cluster, and can be caused by various factors such as resource constraints, misconfigurations, or bugs in the application code. The incident is typically resolved by identifying and addressing the underlying cause of the pod restarts.

### Parameters
```shell
# Environment Variables
export POD_NAMESPACE="PLACEHOLDER"
export POD_NAME="PLACEHOLDER"
export CONTAINER_NAME="PLACEHOLDER"
export K8S_MANIFEST_FILE="PLACEHOLDER"
```

## Debug

### List all pods in <namespace>
```shell
kubectl get pods -n ${POD_NAMESPACE}
```

### Get detailed information about a specific pod
```shell
kubectl describe pod ${POD_NAME} -n ${POD_NAMESPACE}
```

### View the logs for a specific container in a pod
```shell
kubectl logs ${POD_NAME} ${CONTAINER_NAME} -n ${POD_NAMESPACE}
```

### View the events related to a specific pod
```shell
kubectl get events -n ${POD_NAMESPACE} --field-selector involvedObject.name=${POD_NAME}
```

### View the metrics for a specific pod
```shell
kubectl top pod ${POD_NAME} -n ${POD_NAMESPACE}
```

### Misconfigurations: The pod may be restarting due to misconfigurations in the Kubernetes manifest files, such as incorrect environment variables or volume mounts. This may also be caused by misconfigured resource requests or limits.
```shell

#!/bin/bash

# Set variables
POD=${POD_NAME}
NAMESPACE=${POD_NAMESPACE}
K8S_MANIFEST=${K8S_MANIFEST_FILE}

# Get pod status
POD_STATUS=$(kubectl get pod $POD -n $NAMESPACE -o jsonpath='{.status.phase}')

# Check if pod is running
if [ "$POD_STATUS" != "Running" ]
then
  echo "Pod is not running. Current status: $POD_STATUS"
  exit 1
fi

# Get pod restart count
POD_RESTARTS=$(kubectl get pod $POD -n $NAMESPACE -o jsonpath='{.status.containerStatuses[0].restartCount}')

# Check if pod has restarted multiple times
if [ "$POD_RESTARTS" -lt 2 ]
then
  echo "Pod has not restarted multiple times. Current restart count: $POD_RESTARTS"
  exit 1
fi

# Check for misconfigurations in manifest file
if grep -q "env:" $K8S_MANIFEST || grep -q "volumeMounts:" $K8S_MANIFEST || grep -q "resources:" $K8S_MANIFEST
then
  echo "Misconfigurations found in manifest file: $K8S_MANIFEST"
  exit 0
else
  echo "No misconfigurations found in manifest file: $K8S_MANIFEST"
  exit 1
fi

```

### Resource constraints: The pod may be restarting due to insufficient resources such as CPU or memory. This may be due to high resource usage by other pods running on the same node or cluster, or because the pod's resource requests or limits are not properly configured.
```shell

#!/bin/bash

# STEP 1: Get the name and namespace of the pod to diagnose
pod_name="${POD_NAME}"
pod_namespace="${POD_NAMESPACE}"

# STEP 2: Get the name of the node where the pod is running
node_name=$(kubectl get pod $pod_name -n $pod_namespace -o jsonpath='{.spec.nodeName}')

# STEP 3: Check the resource usage of the node
kubectl top node $node_name

# STEP 4: Check the resource requests and limits of the pod
kubectl describe pod $pod_name -n $pod_namespace | grep -E "Limits:|Requests:"

```

---

## Repair
---

### Adjust the memory requests and limits.
```shell
#!/bin/bash

# Set the deployment, container, request memory, and limit memory variables
deployment_name="PLACEHOLDER"
container_name="PLACEHOLDER"
request_memory="PLACEHOLDER"
limit_memory="2GPLACEHOLDERi"

# Patch the deployment with the specified memory settings
kubectl patch deployment "$deployment_name" -p "{\"spec\":{\"template\":{\"spec\":{\"containers\":[{\"name\":\"$container_name\",\"resources\":{\"requests\":{\"memory\":\"$request_memory\"},\"limits\":{\"memory\":\"$limit_memory\"}}}]}}}}"

```

---


A Kubernetes Pod Restarting Monitoring incident is triggered when a pod running on a Kubernetes cluster restarts multiple times within a certain time frame. This incident type is usually used to detect issues with the application or infrastructure running on the cluster, and can be caused by various factors such as resource constraints, misconfigurations, or bugs in the application code. The incident is typically resolved by identifying and addressing the underlying cause of the pod restarts.


The Vault cluster health incident is related to the health of a Vault cluster instance. This incident type is triggered when the cluster instance is not healthy and requires attention to ensure it is functioning properly. The incident typically involves evaluating the current state of the cluster instance, diagnosing the issue, and taking corrective action to restore the health of the instance.


Vault cluster health incident on kubernetes

Nodes with PID Pressure in Kubernetes is an incident type that occurs when a Kubernetes cluster node experiences PID pressure, meaning that it may not be able to start more containers. This is a rare condition where a pod or container spawns too many processes and starves the node of available process IDs. Each node has a limited number of process IDs to distribute amongst running processes; and if it runs out of IDs, no other processes can be started. Kubernetes lets you set PID thresholds for pods to limit their ability to perform runaway process-spawning, and a PID pressure condition means that one or more pods are using up their allocated PIDs and need to be examined.


Nodes with PID Pressure in Kubernetes

The incident type of "Kubernetes deployment with multiple restarts" indicates that a Kubernetes deployment has experienced multiple restarts within a certain timeframe, which is usually indicative of a problem. Kubernetes is a popular container orchestration platform that automates the deployment, scaling, and management of containerized applications. When a deployment experiences multiple restarts, it can impact the availability and performance of the application, and can be a sign of underlying issues that need to be addressed. This incident type is typically monitored and managed by DevOps teams responsible for ensuring the health and reliability of Kubernetes-based applications.


Kubernetes deployment with multiple restarts

This incident type involves monitoring the replicas of a Kubernetes Statefulset, which is a type of workload in Kubernetes used for stateful applications. The incident is triggered when more than one replica's pods are down, creating an unsafe situation for manual operations. This incident is critical and requires immediate attention to resolve the issue and ensure the smooth functioning of the stateful applications.


Kubernetes Statefulset Replicas Monitoring Incident

A Kubernetes Replicaset Incomplete incident typically occurs when a specific number of pods that should be running are not, due to reasons such as failed pod initialization, unavailability of resources in the cluster, or inability to pull the image. This incident is usually triggered when the difference between desired and running pods is greater than zero, and it can be detected through monitoring tools like Datadog.


Kubernetes Replicaset Incomplete

```shell
# Environment Variables
export POD_NAMESPACE="PLACEHOLDER"
export POD_NAME="PLACEHOLDER"
export CONTAINER_NAME="PLACEHOLDER"
export K8S_MANIFEST_FILE="PLACEHOLDER"
```


### List all pods in <namespace>

```shell
kubectl get pods -n ${POD_NAMESPACE}
```

### Get detailed information about a specific pod

```shell
kubectl describe pod ${POD_NAME} -n ${POD_NAMESPACE}
```

### View the logs for a specific container in a pod

```shell
kubectl logs ${POD_NAME} ${CONTAINER_NAME} -n ${POD_NAMESPACE}
```

### View the events related to a specific pod

```shell
kubectl get events -n ${POD_NAMESPACE} --field-selector involvedObject.name=${POD_NAME}
```

### View the metrics for a specific pod

```shell
kubectl top pod ${POD_NAME} -n ${POD_NAMESPACE}
```

### Misconfigurations: The pod may be restarting due to misconfigurations in the Kubernetes manifest files, such as incorrect environment variables or volume mounts. This may also be caused by misconfigured resource requests or limits.

```shell

#!/bin/bash

# Set variables
POD=${POD_NAME}
NAMESPACE=${POD_NAMESPACE}
K8S_MANIFEST=${K8S_MANIFEST_FILE}

# Get pod status
POD_STATUS=$(kubectl get pod $POD -n $NAMESPACE -o jsonpath='{.status.phase}')

# Check if pod is running
if [ "$POD_STATUS" != "Running" ]
then
  echo "Pod is not running. Current status: $POD_STATUS"
  exit 1
fi

# Get pod restart count
POD_RESTARTS=$(kubectl get pod $POD -n $NAMESPACE -o jsonpath='{.status.containerStatuses[0].restartCount}')

# Check if pod has restarted multiple times
if [ "$POD_RESTARTS" -lt 2 ]
then
  echo "Pod has not restarted multiple times. Current restart count: $POD_RESTARTS"
  exit 1
fi

# Check for misconfigurations in manifest file
if grep -q "env:" $K8S_MANIFEST || grep -q "volumeMounts:" $K8S_MANIFEST || grep -q "resources:" $K8S_MANIFEST
then
  echo "Misconfigurations found in manifest file: $K8S_MANIFEST"
  exit 0
else
  echo "No misconfigurations found in manifest file: $K8S_MANIFEST"
  exit 1
fi

```

### Resource constraints: The pod may be restarting due to insufficient resources such as CPU or memory. This may be due to high resource usage by other pods running on the same node or cluster, or because the pod's resource requests or limits are not properly configured.

```shell

#!/bin/bash

# STEP 1: Get the name and namespace of the pod to diagnose
pod_name="${POD_NAME}"
pod_namespace="${POD_NAMESPACE}"

# STEP 2: Get the name of the node where the pod is running
node_name=$(kubectl get pod $pod_name -n $pod_namespace -o jsonpath='{.spec.nodeName}')

# STEP 3: Check the resource usage of the node
kubectl top node $node_name

# STEP 4: Check the resource requests and limits of the pod
kubectl describe pod $pod_name -n $pod_namespace | grep -E "Limits:|Requests:"

```


### Adjust the memory requests and limits.

```shell
#!/bin/bash

# Set the deployment, container, request memory, and limit memory variables
deployment_name="PLACEHOLDER"
container_name="PLACEHOLDER"
request_memory="PLACEHOLDER"
limit_memory="2GPLACEHOLDERi"

# Patch the deployment with the specified memory settings
kubectl patch deployment "$deployment_name" -p "{\"spec\":{\"template\":{\"spec\":{\"containers\":[{\"name\":\"$container_name\",\"resources\":{\"requests\":{\"memory\":\"$request_memory\"},\"limits\":{\"memory\":\"$limit_memory\"}}}]}}}}"

```


Kubernetes Pod Restarting Monitoring

Overview

Parameters

Debug

List all pods in <namespace>

Get detailed information about a specific pod

View the logs for a specific container in a pod

View the metrics for a specific pod

Misconfigurations: The pod may be restarting due to misconfigurations in the Kubernetes manifest files, such as incorrect environment variables or volume mounts. This may also be caused by misconfigured resource requests or limits.

Resource constraints: The pod may be restarting due to insufficient resources such as CPU or memory. This may be due to high resource usage by other pods running on the same node or cluster, or because the pod's resource requests or limits are not properly configured.

Repair

Adjust the memory requests and limits.

Learn more

Related Runbooks

Vault cluster health incident on kubernetes

Nodes with PID Pressure in Kubernetes

Kubernetes deployment with multiple restarts

Kubernetes Statefulset Replicas Monitoring Incident

Support