This incident type involves monitoring the replicas of a Kubernetes Statefulset, which is a type of workload in Kubernetes used for stateful applications. The incident is triggered when more than one replica's pods are down, creating an unsafe situation for manual operations. This incident is critical and requires immediate attention to resolve the issue and ensure the smooth functioning of the stateful applications.
Parameters
Debug
Get the desired number of replicas for the specified Statefulset
Get the number of ready replicas for the specified Statefulset
Get the number of currently running replicas for the specified Statefulset
Get the number of replicas that are currently unavailable for the specified Statefulset
Get the status of all the pods belonging to the specified Statefulset
Get the logs of the specified pod
Resource constraints: Resource constraints such as CPU, memory, or disk space issues can cause the Kubernetes Statefulset replicas to stop functioning. This can lead to the triggering of the incident mentioned above.
Network issues: Network issues such as DNS resolution failure, network connectivity issues, or firewall configuration errors can cause the Kubernetes Statefulset replicas to stop functioning. As a result, this could trigger the incident mentioned above.
Repair
Scale up the number of replicas to ensure that the desired state is achieved and the workload is available.
Learn more
Related Runbooks
Check out these related runbooks to help you debug and resolve similar issues.