A Host Out of Memory(OOM) Incident occurs when a server or system runs out of memory, causing it to crash or become unresponsive. This can be caused by a variety of factors, such as an unexpected surge in traffic or insufficient resources allocated to the system. Resolving this type of incident requires identifying the root cause of the memory issue and taking appropriate measures such as optimizing system resources or increasing memory capacity.
Parameters
Debug
Check the amount of free memory
Check the amount of used memory by each process
check the journalctl logs for any out of memory errors
Check the garbage collector logs for any errors
Check the process limits for the user running the process
Check the system limits for the amount of memory available
Check the kernel logs for any memory-related errors
Check the swap usage on the host
The host may be running too many applications or processes simultaneously, causing excessive memory usage.
Note
Before you proceed with changing the instance type, please be aware that the current instance will restart during the process. Changing the instance type involves stopping the current instance, resizing its resources, and then starting it again with the new configuration.
Changing AWS Instance type Using AWS CLI
Change the size of an Azure VM Using the Azure CLI
Changing the Machine type in GCP
In Kubernetes, you can change the memory resources for a pod's containers using the kubectl command-line tool. There are two common ways to achieve this: by updating the pod's YAML manifest file or by using kubectl edit.
Learn more
Related Runbooks
Check out these related runbooks to help you debug and resolve similar issues.