Runbook

Kubernetes Statefulset Replicas Monitoring Incident

Back to Runbooks

Overview

This incident type involves monitoring the replicas of a Kubernetes Statefulset, which is a type of workload in Kubernetes used for stateful applications. The incident is triggered when more than one replica's pods are down, creating an unsafe situation for manual operations. This incident is critical and requires immediate attention to resolve the issue and ensure the smooth functioning of the stateful applications.

Parameters

Debug

Get the desired number of replicas for the specified Statefulset

Get the number of ready replicas for the specified Statefulset

Get the number of currently running replicas for the specified Statefulset

Get the number of replicas that are currently unavailable for the specified Statefulset

Get the status of all the pods belonging to the specified Statefulset

Get the logs of the specified pod

Resource constraints: Resource constraints such as CPU, memory, or disk space issues can cause the Kubernetes Statefulset replicas to stop functioning. This can lead to the triggering of the incident mentioned above.

Network issues: Network issues such as DNS resolution failure, network connectivity issues, or firewall configuration errors can cause the Kubernetes Statefulset replicas to stop functioning. As a result, this could trigger the incident mentioned above.

Repair

Scale up the number of replicas to ensure that the desired state is achieved and the workload is available.

Learn more

Related Runbooks

Check out these related runbooks to help you debug and resolve similar issues.