Shoreline closes $35M series B - Read the details

Kubernetes Node Retirement

< Solutions Library


Customer Experience Impact:

High – potentially hours of downtime

Frequency of Occurrence:

Monthly for fleets with hundreds of nodes

Manual repair elapsed time:

~ 2-4 hours

Shoreline repair elapsed time:

~ 0

Free Trial

The Problem

From time to time, AWS will need to repair or upgrade a server in its network. If you are using that server, AWS Systems manager will send you an alert letting you know this node has been marked for retirement. This doesn’t happen that often, so many companies haven’t designed a way to gracefully terminate their work on this node. As a result, AWS will often be forced to take the server offline, abruptly killing any services running on this box. This can often lead to data loss and customer downtime.

The Solution

Shoreline makes it easy to cleanly handle nodes marked for retirement. First, Shoreline has a pre-built alarm that triggers whenever a node is marked for retirement by AWS EC2. From there, Shoreline automates the process of cordoning, draining and terminating these nodes. This process then triggers Kubernetes to automatically spin up another version of this node. This approach ensures that all services running on this box are gracefully terminated without interrupting any transactions. Sometimes nodes marked for retirement get stuck part way through the retirement process. If this happens, Kubernetes may still think the node is online and won’t spin up a new version of this node. In this case, Shoreline will then terminate or restart the box, which at least ensures that the right capacity is available for all applications.

Ready to get started?

Shoreline helps you eliminate repetitive tickets and increase your availability at the same time. Get started today with a free trial.