Runbook

Kubernetes Nodes with Memorypressure incident

Back to Runbooks

Overview

The Kubernetes Nodes with Memorypressure incident type occurs when a Kubernetes cluster node is running low on memory, which can be caused by a memory leak in an application. This incident type requires immediate attention to prevent any downtime and ensure the proper functioning of the Kubernetes cluster. Typically, this incident type is monitored by DevOps teams using various monitoring tools, including PagerDuty, to identify and address memory pressure issues quickly.

Parameters

Debug

List all the nodes in the Kubernetes cluster

Get detailed information about a specific node

Check the memory usage metrics for the node

List all the pods running on the node

Get detailed information about a specific pod

Check the memory usage metrics for the pod

Check the logs for the pod to see if there are any memory leak errors

Delete and recreate the pod to see if that resolves the memory pressure issue

The Kubernetes cluster may be under-provisioned, meaning that the resources allocated to the cluster are insufficient to handle the workload, leading to memory pressure.

Repair

Identify and troubleshoot memory leaks in applications running on the node.

Learn more

Related Runbooks

Check out these related runbooks to help you debug and resolve similar issues.