Shoreline closes $35M series B - Read the details

Pods Stuck in Terminating

< Solutions Library

Highlights

Customer Experience Impact:

High - Apps fail to schedule, causing unavailability

Frequency of Occurrence:

Weekly for fleets with hundreds of nodes

Manual SRE time spent on diagnosis and repair:

~ 2-4 hours

Shoreline time to repair:

~ 0

Free Trial

The Problem

When Kubernetes pods won’t leave the terminating state, this suggests that the underlying node is likely broken. When this occurs, apps may fail to schedule, causing unavailability. This can become a financial drain on your organization because this issue can lead to unnecessary scaling.

This is a difficult issue for many teams to diagnose because Kubernetes pods are often in the terminating state, meaning it’s tricky to know which ones have been around for too long. Fixing this issue is complex since Node draining in Kubernetes must be configured in a way to work for your environment. This will need to take into account time-out periods, pod disruption policies and other cluster-wide configurations.

The Solution

Shoreline’s Pods Stuck in Terminating Op Pack talks to Kubernetes master and checks on various pod states and determines if a pod has been terminating for too long. This is done by cordoning, draining, and then terminating the node so that it is safely cleaned up so that it is not impacting other software.

Ready to get started?

Shoreline helps you eliminate repetitive tickets and increase your availability at the same time. Get started today with a free trial.