Shoreline closes $35M series B - Read the details

Intermittent JVM Memory Issues

< Solutions Library

Highlights

Customer Experience Impact:

Up to 30 minutes of poor performance

Frequency of Occurrence:

Frequently until the root cause is identified

Manual repair elapsed time:

~ 2-4 hours

Shoreline repair elapsed time:

~ 1-2 minutes

Provision from Terraform RegistryFree Trial

The Problem

Java virtual machines (JVMs) can often face memory issues. Usually this is because certain requests, payloads or jobs consume more memory than was anticipated. The Java garbage collector is actually quite robust, so eventually the situation is resolved, but while it is occurring, garbage collection takes priority and latency often spikes leading to poor customer experiences. Permanently fixing this type of issue often requires heap dump and garbage collection statistics that are only available while the issue is occurring. What makes this situation even harder is that very few people understand how garbage collection works, making it even more tricky to diagnose. SREs are frequently asked to capture the debug data for this situation, which can lead to hours of SSH-ing into box after box trying to catch a JVM experiencing the memory issue.

The Solution

With Shoreline, customers can set an alarm that looks for a heap size that exceeds a certain threshold. Once the alarm fires, a script can be executed that runs stdout to run jcmd, jstack, jstat and jmap to get a heap dump, thread dump, GC stats and heap stats. Once this data is collected, it is pushed to a cloud storage service and then the JVM is restarted. This is all done in seconds, ensuring the least possible impact on the customer experience. This also saves the SRE hours of exploratory work and ensures that engineering has everything they need to fix the root cause of the issue.

Ready to get started?

Shoreline helps you eliminate repetitive tickets and increase your availability at the same time. Get started today with a free trial.