---
id: 008fedc6-cb9c-48c0-b79e-199b93eca09b
---

# Kubernetes Daemonset Multiple Restarts
---

This incident type refers to an alert triggered by a Kubernetes Daemonset restarting multiple times within a short period of time. This can be an indication of a problem with the application or infrastructure and needs to be investigated and resolved promptly. The incident may require collaboration between the development and operations teams to identify the root cause and implement a fix to prevent further occurrences.

### Parameters
```shell
# Environment Variables

export DAEMONSET_NAME="PLACEHOLDER"

export POD_NAME="PLACEHOLDER"

export NODE_NAME="PLACEHOLDER"

export CONTAINER_NAME="PLACEHOLDER"

export NAMESPACE="PLACEHOLDER"
```

## Debug

### List all daemonsets in the default namespace
```shell
kubectl get daemonsets
```

### Describe a specific daemonset
```shell
kubectl describe daemonset ${DAEMONSET_NAME}
```

### Check the status of all daemonset pods
```shell
kubectl get pods -l 'app=${DAEMONSET_NAME}' -w
```

### Get logs for a specific pod
```shell
kubectl logs ${POD_NAME}
```

### Check the restart count for a specific pod
```shell
kubectl get pod ${POD_NAME} -o jsonpath='{.status.containerStatuses[0].restartCount}'
```

### Check the status of all nodes in the cluster
```shell
kubectl get nodes
```

### Check the status of all pods running on a specific node
```shell
kubectl get pods --all-namespaces -o wide --field-selector spec.nodeName=${NODE_NAME}
```

### The Kubernetes cluster may not have enough resources to support the containers running on the Daemonset, causing them to restart frequently.
```shell


#!/bin/bash



# Get the resource usage metrics for the Kubernetes cluster

kubectl top nodes



# Get the resource requests and limits set for the containers running on the Daemonset

kubectl describe daemonset ${DAEMONSET_NAME} | grep -A 5 Resources



# Check if any of the resources are being over utilized by the containers

kubectl describe daemonset ${DAEMONSET_NAME} | grep -A 10 Resources | grep -i -e 'cpu' -e 'memory'



# Check if there are any resource quotas set for the Kubernetes cluster

kubectl describe quota



# Check if any nodes are underutilized or overutilized

kubectl describe nodes



# Check if there are any pod eviction events triggered by resource constraints

kubectl get events --sort-by=.metadata.creationTimestamp | grep -i 'outofmemory' | tail -n 5


```

## Repair

### Increase the resources allocated to the containers to prevent them from running out of memory or CPU and causing a restart.
```shell
bash

#!/bin/bash



# Set the namespace and deployment name

NAMESPACE=${NAMESPACE}

DAEMONSET_NAME=${DAEMONSET_NAME}



# Get the current resource limits for the containers

CURRENT_LIMITS=$(kubectl get ds $DAEMONSET_NAME -n $NAMESPACE -o jsonpath='{.spec.template.spec.containers[0].resources.limits}')



# Increase the CPU and memory limits by 50%

CPU_LIMIT=$(echo $CURRENT_LIMITS | sed 's/cpu=/cpu=50%/g')

MEMORY_LIMIT=$(echo $CURRENT_LIMITS | sed 's/memory=/memory=50%/g')



# Update the deployment with the new limits

kubectl patch ds $DAEMONSET_NAME -n $NAMESPACE --patch \

"{\"spec\":{\"template\":{\"spec\":{\"containers\":[{\"name\":\"${CONTAINER_NAME}\",\"resources\":{\"limits\":{\"$CPU_LIMIT\",\"$MEMORY_LIMIT\"}}}]}}}}"


```

This incident type refers to an alert triggered by a Kubernetes Daemonset restarting multiple times within a short period of time. This can be an indication of a problem with the application or infrastructure and needs to be investigated and resolved promptly. The incident may require collaboration between the development and operations teams to identify the root cause and implement a fix to prevent further occurrences.


The Vault cluster health incident is related to the health of a Vault cluster instance. This incident type is triggered when the cluster instance is not healthy and requires attention to ensure it is functioning properly. The incident typically involves evaluating the current state of the cluster instance, diagnosing the issue, and taking corrective action to restore the health of the instance.


Vault cluster health incident on kubernetes

This incident type refers to the failure of a systemd service on a particular host instance. The incident could be triggered by various causes such as a software bug, hardware failure, or system overload. This type of incident can cause downtime or service disruption to the affected host instance, which may require immediate resolution to restore normal operations.


Host systemd service crashed (instance) incident.

The incident type of "Kubernetes deployment with multiple restarts" indicates that a Kubernetes deployment has experienced multiple restarts within a certain timeframe, which is usually indicative of a problem. Kubernetes is a popular container orchestration platform that automates the deployment, scaling, and management of containerized applications. When a deployment experiences multiple restarts, it can impact the availability and performance of the application, and can be a sign of underlying issues that need to be addressed. This incident type is typically monitored and managed by DevOps teams responsible for ensuring the health and reliability of Kubernetes-based applications.


Kubernetes deployment with multiple restarts

This incident type involves monitoring the replicas of a Kubernetes Statefulset, which is a type of workload in Kubernetes used for stateful applications. The incident is triggered when more than one replica's pods are down, creating an unsafe situation for manual operations. This incident is critical and requires immediate attention to resolve the issue and ensure the smooth functioning of the stateful applications.


Kubernetes Statefulset Replicas Monitoring Incident

A Kubernetes Replicaset Incomplete incident typically occurs when a specific number of pods that should be running are not, due to reasons such as failed pod initialization, unavailability of resources in the cluster, or inability to pull the image. This incident is usually triggered when the difference between desired and running pods is greater than zero, and it can be detected through monitoring tools like Datadog.


Kubernetes Replicaset Incomplete

```shell
# Environment Variables

export DAEMONSET_NAME="PLACEHOLDER"

export POD_NAME="PLACEHOLDER"

export NODE_NAME="PLACEHOLDER"

export CONTAINER_NAME="PLACEHOLDER"

export NAMESPACE="PLACEHOLDER"
```


### List all daemonsets in the default namespace

```shell
kubectl get daemonsets
```

### Describe a specific daemonset

```shell
kubectl describe daemonset ${DAEMONSET_NAME}
```

### Check the status of all daemonset pods

```shell
kubectl get pods -l 'app=${DAEMONSET_NAME}' -w
```

### Get logs for a specific pod

```shell
kubectl logs ${POD_NAME}
```

### Check the restart count for a specific pod

```shell
kubectl get pod ${POD_NAME} -o jsonpath='{.status.containerStatuses[0].restartCount}'
```

### Check the status of all nodes in the cluster

```shell
kubectl get nodes
```

### Check the status of all pods running on a specific node

```shell
kubectl get pods --all-namespaces -o wide --field-selector spec.nodeName=${NODE_NAME}
```

### The Kubernetes cluster may not have enough resources to support the containers running on the Daemonset, causing them to restart frequently.

```shell


#!/bin/bash



# Get the resource usage metrics for the Kubernetes cluster

kubectl top nodes



# Get the resource requests and limits set for the containers running on the Daemonset

kubectl describe daemonset ${DAEMONSET_NAME} | grep -A 5 Resources



# Check if any of the resources are being over utilized by the containers

kubectl describe daemonset ${DAEMONSET_NAME} | grep -A 10 Resources | grep -i -e 'cpu' -e 'memory'



# Check if there are any resource quotas set for the Kubernetes cluster

kubectl describe quota



# Check if any nodes are underutilized or overutilized

kubectl describe nodes



# Check if there are any pod eviction events triggered by resource constraints

kubectl get events --sort-by=.metadata.creationTimestamp | grep -i 'outofmemory' | tail -n 5


```


### Increase the resources allocated to the containers to prevent them from running out of memory or CPU and causing a restart.

```shell
bash

#!/bin/bash



# Set the namespace and deployment name

NAMESPACE=${NAMESPACE}

DAEMONSET_NAME=${DAEMONSET_NAME}



# Get the current resource limits for the containers

CURRENT_LIMITS=$(kubectl get ds $DAEMONSET_NAME -n $NAMESPACE -o jsonpath='{.spec.template.spec.containers[0].resources.limits}')



# Increase the CPU and memory limits by 50%

CPU_LIMIT=$(echo $CURRENT_LIMITS | sed 's/cpu=/cpu=50%/g')

MEMORY_LIMIT=$(echo $CURRENT_LIMITS | sed 's/memory=/memory=50%/g')



# Update the deployment with the new limits

kubectl patch ds $DAEMONSET_NAME -n $NAMESPACE --patch \

"{\"spec\":{\"template\":{\"spec\":{\"containers\":[{\"name\":\"${CONTAINER_NAME}\",\"resources\":{\"limits\":{\"$CPU_LIMIT\",\"$MEMORY_LIMIT\"}}}]}}}}"


```


Kubernetes Daemonset Multiple Restarts

Overview

Parameters

Debug

List all daemonsets in the default namespace

Describe a specific daemonset

Check the status of all daemonset pods

Get logs for a specific pod

Check the restart count for a specific pod

Check the status of all nodes in the cluster

Check the status of all pods running on a specific node

The Kubernetes cluster may not have enough resources to support the containers running on the Daemonset, causing them to restart frequently.

Repair

Increase the resources allocated to the containers to prevent them from running out of memory or CPU and causing a restart.

Learn more

Related Runbooks

Vault cluster health incident on kubernetes

Host systemd service crashed (instance) incident.

Kubernetes deployment with multiple restarts

Kubernetes Statefulset Replicas Monitoring Incident

Support