Runbook

Host Out of Memory (OOM) Incident

Back to Runbooks

Overview

A Host Out of Memory(OOM) Incident occurs when a server or system runs out of memory, causing it to crash or become unresponsive. This can be caused by a variety of factors, such as an unexpected surge in traffic or insufficient resources allocated to the system. Resolving this type of incident requires identifying the root cause of the memory issue and taking appropriate measures such as optimizing system resources or increasing memory capacity.

Parameters

Debug

Check the amount of free memory

Check the amount of used memory by each process

check the journalctl logs for any out of memory errors

Check the garbage collector logs for any errors

Check the process limits for the user running the process

Check the system limits for the amount of memory available

Check the swap usage on the host

The host may be running too many applications or processes simultaneously, causing excessive memory usage.

Note

Before you proceed with changing the instance type, please be aware that the current instance will restart during the process. Changing the instance type involves stopping the current instance, resizing its resources, and then starting it again with the new configuration.

Changing AWS Instance type Using AWS CLI

Change the size of an Azure VM Using the Azure CLI

Changing the Machine type in GCP

In Kubernetes, you can change the memory resources for a pod's containers using the kubectl command-line tool. There are two common ways to achieve this: by updating the pod's YAML manifest file or by using kubectl edit.

Learn more

Related Runbooks

Check out these related runbooks to help you debug and resolve similar issues.