Runbook

Kubernetes - Pods Failed

Back to Runbooks

Overview

This incident type relates to a failure of Kubernetes pods, which are a fundamental unit of deployment for containerized applications in Kubernetes. The failure could be related to any number of issues, such as a misconfiguration, a software bug, or a resource constraint. The incident may require immediate attention from a software engineer to resolve the issue and prevent further disruption to the application.

Parameters

Debug

Get the list of pods in the cluster

Check the logs of a pod

Describe a pod to get more details about its state and configuration

Check the status of the pod's containers

Check the resource usage of a pod

Resource constraints: If a pod exceeds its resource limits, such as CPU or memory, it may be terminated by Kubernetes.

Repair

Verify that the Kubernetes cluster has enough resources available for the pods to run. Check CPU, memory, and storage usage, and make sure that the pods are not being starved for resources.

Learn more

Related Runbooks

Check out these related runbooks to help you debug and resolve similar issues.