---
id: 2386078a-30ec-46a9-a950-72923fab1652
---

# Kubernetes Nodes with Memorypressure incident
---

The Kubernetes Nodes with Memorypressure incident type occurs when a Kubernetes cluster node is running low on memory, which can be caused by a memory leak in an application. This incident type requires immediate attention to prevent any downtime and ensure the proper functioning of the Kubernetes cluster. Typically, this incident type is monitored by DevOps teams using various monitoring tools, including PagerDuty, to identify and address memory pressure issues quickly.

### Parameters
```shell
# Environment Variables

export NODE_NAME="PLACEHOLDER"

export POD_NAME="PLACEHOLDER"

export POD_MANIFEST_FILE="PLACEHOLDER"

export MEMORY_SIZE="PLACEHOLDER"

export NAMESPACE="PLACEHOLDER"

export APPLICATION_NAME="PLACEHOLDER"
```

## Debug

### List all the nodes in the Kubernetes cluster
```shell
kubectl get nodes
```

### Get detailed information about a specific node
```shell
kubectl describe node ${NODE_NAME}
```

### Check the memory usage metrics for the node
```shell
kubectl top node ${NODE_NAME}
```

### List all the pods running on the node
```shell
kubectl get pods --all-namespaces -o wide --field-selector spec.nodeName=${NODE_NAME}
```

### Get detailed information about a specific pod
```shell
kubectl describe pod ${POD_NAME}
```

### Check the memory usage metrics for the pod
```shell
kubectl top pod ${POD_NAME}
```

### Check the logs for the pod to see if there are any memory leak errors
```shell
kubectl logs ${POD_NAME}
```

### Check the Kubernetes events for the pod to see if there are any memory-related issues
```shell
kubectl get events --field-selector involvedObject.name=${POD_NAME}
```

### Delete and recreate the pod to see if that resolves the memory pressure issue
```shell
kubectl delete pod ${POD_NAME}

kubectl apply -f ${POD_MANIFEST_FILE}
```

### The Kubernetes cluster may be under-provisioned, meaning that the resources allocated to the cluster are insufficient to handle the workload, leading to memory pressure.
```shell
#!/bin/bash

# Get the percentage of memory used on each node in the Kubernetes cluster

NODES=($(kubectl get nodes --no-headers | awk '{print $1}'))

for NODE in "${NODES[@]}"

do

  MEMORY_PERCENTAGE=$(kubectl describe node $NODE | grep -i "memory pressure" | awk '{print $3}' | sed 's/(//g' | sed 's/)//g')

  echo "Node $NODE is using ${MEMORY_PERCENTAGE}% of memory"

done

# Get the total amount of memory available in the Kubernetes cluster

MEMORY_CAPACITY=$(kubectl describe nodes | grep -i "memory capacity" | awk '{print $3}' | sed 's/(//g' | sed 's/)//g' | awk '{s+=$1} END {print s/1024/1024 " GB"}')

echo "The Kubernetes cluster has a total memory capacity of $MEMORY_CAPACITY"

# Calculate the total amount of memory used in the Kubernetes cluster

MEMORY_USED=$(kubectl describe nodes | grep -i "memory capacity" | awk '{print $3}' | sed 's/(//g' | sed 's/)//g' | awk '{s+=$1} END {print s}')

echo "The Kubernetes cluster is currently using $(($MEMORY_USED/1024/1024)) GB of memory"

# Calculate the percentage of memory used in the Kubernetes cluster

MEMORY_PERCENTAGE=$(($MEMORY_USED*100/$MEMORY_CAPACITY))

echo "The Kubernetes cluster is using ${MEMORY_PERCENTAGE}% of memory capacity"

# Check if the memory usage is close to the memory capacity

THRESHOLD=90

if [ $MEMORY_PERCENTAGE -ge $THRESHOLD ]

then
  echo "The Kubernetes cluster may be under-provisioned, as the memory usage is above ${THRESHOLD}% threshold"

else

  echo "The Kubernetes cluster memory usage is within normal range"

fi

```

## Repair

### Identify and troubleshoot memory leaks in applications running on the node.
```shell
#!/bin/bash

# Set the namespace and pod name

NAMESPACE=${NAMESPACE}

POD=${POD_NAME}

# Get the name of the container running on the pod

CONTAINER=$(kubectl -n $NAMESPACE get po $POD -o jsonpath='{.spec.containers[0].name}')

# Get the logs for the container

LOGS=$(kubectl -n $NAMESPACE logs $POD $CONTAINER)

# Search the logs for any indications of a memory leak

if echo "$LOGS" | grep -q "memory leak"; then

    # If a memory leak is detected, identify the application causing the leak

    APPLICATION=$(echo "$LOGS" | grep "memory leak" | awk '{print $NF}')

    # Stop the container running the problematic application

    kubectl -n $NAMESPACE delete po $POD --grace-period=0 --force

    echo "Stopped container $CONTAINER running $APPLICATION due to memory leak"

else

    echo "No memory leaks detected in container $CONTAINER"

fi
```

The Kubernetes Nodes with Memorypressure incident type occurs when a Kubernetes cluster node is running low on memory, which can be caused by a memory leak in an application. This incident type requires immediate attention to prevent any downtime and ensure the proper functioning of the Kubernetes cluster. Typically, this incident type is monitored by DevOps teams using various monitoring tools, including PagerDuty, to identify and address memory pressure issues quickly.


A Host Out of Memory(OOM) Incident occurs when a server or system runs out of memory, causing it to crash or become unresponsive. This can be caused by a variety of factors, such as an unexpected surge in traffic or insufficient resources allocated to the system. Resolving this type of incident requires identifying the root cause of the memory issue and taking appropriate measures such as optimizing system resources or increasing memory capacity.


Host Out of Memory (OOM) Incident

This incident type involves monitoring the replicas of a Kubernetes Statefulset, which is a type of workload in Kubernetes used for stateful applications. The incident is triggered when more than one replica's pods are down, creating an unsafe situation for manual operations. This incident is critical and requires immediate attention to resolve the issue and ensure the smooth functioning of the stateful applications.


Kubernetes Statefulset Replicas Monitoring Incident

A Kubernetes Replicaset Incomplete incident typically occurs when a specific number of pods that should be running are not, due to reasons such as failed pod initialization, unavailability of resources in the cluster, or inability to pull the image. This incident is usually triggered when the difference between desired and running pods is greater than zero, and it can be detected through monitoring tools like Datadog.


Kubernetes Replicaset Incomplete

A Kubernetes Pod Restarting Monitoring incident is triggered when a pod running on a Kubernetes cluster restarts multiple times within a certain time frame. This incident type is usually used to detect issues with the application or infrastructure running on the cluster, and can be caused by various factors such as resource constraints, misconfigurations, or bugs in the application code. The incident is typically resolved by identifying and addressing the underlying cause of the pod restarts.


Kubernetes Pod Restarting Monitoring

Kubernetes is a popular container orchestration system used to manage and deploy containerized applications. Kubernetes nodes are the individual servers in a Kubernetes cluster that run the containers. Disk pressure is a condition where a node is using too much disk space or is using disk space too fast according to the thresholds set in the Kubernetes configuration. This condition can be caused by applications legitimately needing more space or an application misbehaving and filling up the disk prematurely in an unanticipated manner. It is important to monitor disk pressure as it can lead to performance issues, instability, or even downtime. When a Kubernetes node experiences disk pressure, it can trigger an incident that needs to be addressed to ensure the stability of the cluster.


Kubernetes Nodes with Disk Pressure

```shell
# Environment Variables

export NODE_NAME="PLACEHOLDER"

export POD_NAME="PLACEHOLDER"

export POD_MANIFEST_FILE="PLACEHOLDER"

export MEMORY_SIZE="PLACEHOLDER"

export NAMESPACE="PLACEHOLDER"

export APPLICATION_NAME="PLACEHOLDER"
```


### List all the nodes in the Kubernetes cluster

```shell
kubectl get nodes
```

### Get detailed information about a specific node

```shell
kubectl describe node ${NODE_NAME}
```

### Check the memory usage metrics for the node

```shell
kubectl top node ${NODE_NAME}
```

### List all the pods running on the node

```shell
kubectl get pods --all-namespaces -o wide --field-selector spec.nodeName=${NODE_NAME}
```

### Get detailed information about a specific pod

```shell
kubectl describe pod ${POD_NAME}
```

### Check the memory usage metrics for the pod

```shell
kubectl top pod ${POD_NAME}
```

### Check the logs for the pod to see if there are any memory leak errors

```shell
kubectl logs ${POD_NAME}
```

### Check the Kubernetes events for the pod to see if there are any memory-related issues

```shell
kubectl get events --field-selector involvedObject.name=${POD_NAME}
```

### Delete and recreate the pod to see if that resolves the memory pressure issue

```shell
kubectl delete pod ${POD_NAME}

kubectl apply -f ${POD_MANIFEST_FILE}
```

### The Kubernetes cluster may be under-provisioned, meaning that the resources allocated to the cluster are insufficient to handle the workload, leading to memory pressure.

```shell
#!/bin/bash

# Get the percentage of memory used on each node in the Kubernetes cluster

NODES=($(kubectl get nodes --no-headers | awk '{print $1}'))

for NODE in "${NODES[@]}"

do

  MEMORY_PERCENTAGE=$(kubectl describe node $NODE | grep -i "memory pressure" | awk '{print $3}' | sed 's/(//g' | sed 's/)//g')

  echo "Node $NODE is using ${MEMORY_PERCENTAGE}% of memory"

done

# Get the total amount of memory available in the Kubernetes cluster

MEMORY_CAPACITY=$(kubectl describe nodes | grep -i "memory capacity" | awk '{print $3}' | sed 's/(//g' | sed 's/)//g' | awk '{s+=$1} END {print s/1024/1024 " GB"}')

echo "The Kubernetes cluster has a total memory capacity of $MEMORY_CAPACITY"

# Calculate the total amount of memory used in the Kubernetes cluster

MEMORY_USED=$(kubectl describe nodes | grep -i "memory capacity" | awk '{print $3}' | sed 's/(//g' | sed 's/)//g' | awk '{s+=$1} END {print s}')

echo "The Kubernetes cluster is currently using $(($MEMORY_USED/1024/1024)) GB of memory"

# Calculate the percentage of memory used in the Kubernetes cluster

MEMORY_PERCENTAGE=$(($MEMORY_USED*100/$MEMORY_CAPACITY))

echo "The Kubernetes cluster is using ${MEMORY_PERCENTAGE}% of memory capacity"

# Check if the memory usage is close to the memory capacity

THRESHOLD=90

if [ $MEMORY_PERCENTAGE -ge $THRESHOLD ]

then
  echo "The Kubernetes cluster may be under-provisioned, as the memory usage is above ${THRESHOLD}% threshold"

else

  echo "The Kubernetes cluster memory usage is within normal range"

fi

```


### Identify and troubleshoot memory leaks in applications running on the node.

```shell
#!/bin/bash

# Set the namespace and pod name

NAMESPACE=${NAMESPACE}

POD=${POD_NAME}

# Get the name of the container running on the pod

CONTAINER=$(kubectl -n $NAMESPACE get po $POD -o jsonpath='{.spec.containers[0].name}')

# Get the logs for the container

LOGS=$(kubectl -n $NAMESPACE logs $POD $CONTAINER)

# Search the logs for any indications of a memory leak

if echo "$LOGS" | grep -q "memory leak"; then

    # If a memory leak is detected, identify the application causing the leak

    APPLICATION=$(echo "$LOGS" | grep "memory leak" | awk '{print $NF}')

    # Stop the container running the problematic application

    kubectl -n $NAMESPACE delete po $POD --grace-period=0 --force

    echo "Stopped container $CONTAINER running $APPLICATION due to memory leak"

else

    echo "No memory leaks detected in container $CONTAINER"

fi
```


Kubernetes Nodes with Memorypressure incident

Overview

Parameters

Debug

List all the nodes in the Kubernetes cluster

Get detailed information about a specific node

Check the memory usage metrics for the node

List all the pods running on the node

Get detailed information about a specific pod

Check the memory usage metrics for the pod

Check the logs for the pod to see if there are any memory leak errors

Delete and recreate the pod to see if that resolves the memory pressure issue

The Kubernetes cluster may be under-provisioned, meaning that the resources allocated to the cluster are insufficient to handle the workload, leading to memory pressure.

Repair

Identify and troubleshoot memory leaks in applications running on the node.

Learn more

Related Runbooks

Host Out of Memory (OOM) Incident

Kubernetes Statefulset Replicas Monitoring Incident

Kubernetes Replicaset Incomplete

Kubernetes Pod Restarting Monitoring

Support