Stop-the-world garbage collection causes everything to freeze and is an issue for any company that runs JVM at scale. If your on-call team doesn’t get there at the moment the problem occurs, the issue is gone, and they have to keep guessing as to what happened. They go from machine to machine hoping to observe the issue again, wasting valuable time and resources. And while tools are available to debug the JVM, they require tremendous effort to set up, and rarely solve the problem because they need to connect at just the right time to witness the problem..

  • Identifying the problem by monitoring all of your JVMs, capturing per second data across a few JVMS or thousands.

  • When a memory or performance limit is reached, capturing the garbage collection, heap, thread, and deadlock data in an S3 bucket so the engineering team can find and permanently fix the root cause.

  • Optionally restarting the JVM or taking another remediating action such as scaling up or down.

