High – potentially hours of downtime
Monthly for fleets with hundreds of nodes
~ 2-4 hours
~ 0Free Trial
From time to time, AWS will need to repair or upgrade a server in its network. If you are using that server, AWS Systems manager will send you an alert letting you know this node has been marked for retirement. This doesn’t happen that often, so many companies haven’t designed a way to gracefully terminate their work on this node. As a result, AWS will often be forced to take the server offline, abruptly killing any services running on this box. This can often lead to data loss and customer downtime.
Shoreline makes it easy to cleanly handle nodes marked for retirement. First, Shoreline has a pre-built alarm that triggers whenever a node is marked for retirement by AWS EC2. From there, Shoreline automates the process of cordoning, draining and terminating these nodes. This process then triggers Kubernetes to automatically spin up another version of this node. This approach ensures that all services running on this box are gracefully terminated without interrupting any transactions. Sometimes nodes marked for retirement get stuck part way through the retirement process. If this happens, Kubernetes may still think the node is online and won’t spin up a new version of this node. In this case, Shoreline will then terminate or restart the box, which at least ensures that the right capacity is available for all applications.