This incident type refers to an alert triggered by a Kubernetes Daemonset restarting multiple times within a short period of time. This can be an indication of a problem with the application or infrastructure and needs to be investigated and resolved promptly. The incident may require collaboration between the development and operations teams to identify the root cause and implement a fix to prevent further occurrences.
Parameters
Debug
List all daemonsets in the default namespace
Describe a specific daemonset
Check the status of all daemonset pods
Get logs for a specific pod
Check the restart count for a specific pod
Check the status of all nodes in the cluster
Check the status of all pods running on a specific node
The Kubernetes cluster may not have enough resources to support the containers running on the Daemonset, causing them to restart frequently.
Repair
Increase the resources allocated to the containers to prevent them from running out of memory or CPU and causing a restart.
Learn more
Related Runbooks
Check out these related runbooks to help you debug and resolve similar issues.