April 26, 2021 | Narendra Nath Challa

Tutorial: automating Kubernetes worker node retirement


Introduction

As powerful as Kubernetes is, it doesn't know what is relevant to your bespoke application software. Kubernetes will do its thing and schedule pods, which are abstract, rather than specific to your company and this can cause problems. One of the jobs operational and platform teams have is to encode rules that handle the nuance of their organization's applications in order to prevent orchestration logic from causing an outage. Shoreline is designed to bridge the automation gap between your application software and your orchestration layer.

This guide shows you how to manually decommission Kubernetes worker nodes and replace them with a new host and then shows you how to automate that process with a Shoreline Op Pack. The Op Pack will automatically and cleanly remove a node scheduled for retirement and allow the autoscaling groups to replace it with a new host on the latest configurations.

This is a relatively common process, but automating it can help you avoid several pitfalls; primarily that Kubernetes will do exactly what it’s told when it’s told to do it, and sometimes that involves behaviors that aren’t what you would otherwise expect.

For example, a few common pitfalls when retiring Kubernetes worker nodes include:

  • Forgetting to drain nodes before deleting them
  • Not waiting for pods to fully migrate before removing cluster capacity
  • Not adding capacity before retiring multiple nodes, leaving the cluster without enough capacity to run its software
  • Forgetting about custom labeling or pod workloads with specific node affinity, leading to difficulty rescheduling pods

Let's dive in. We'll start with at an example architecture, manually reprovision a node, and then see how much easier (and safer) it is to do this using Shoreline.

Our Infrastructure

We have a simple architecture today with worker nodes in an EKS provisioned cluster with our application running distributed across each node. Because our apps could be running on any node at any time, it’s important to be mindful of our processes to prevent disruption to the business.

Within this infrastructure, we’ll need to make some changes to our AutoScaling Group and reprovision a node with the updated configuration.

The Manual Steps

First, see what pods are running and where in order to find our blast radius for this maintenance.

Ensure that no other pods will be added to this blast radius while we work.

Remove any pods running on this node.

Verify that our applications are running on other nodes now. If we skip this step, we risk progressing too quickly and impacting the application by removing too many applications from service leading to resource contention.

Safely delete the node from the cluster to remove it from any other management/automation.

Verify the work has completed.

Permanently delete the instance to ensure the AutoScaling Group brings in a new, fresh node.

Verify that the cluster now sees the node.

While conceptually, the process of setting up the cluster and reprovisioning nodes is a simple matter, there’s a lot of manual steps for it to be a safe practice. Here we had to take several Kubernetes specific steps, and several EKS specific steps including hopping out of our CLI to find and delete the appropriate node in the console.

Automating with this Shoreline

Now, we'll show you how to automate this work using Shoreline and Shoreline's operations language, Op. Shoreline Op is a purpose-built operations oriented language designed to allow operators and admins to rapidly understand, debug, and fix systems during an operational event, and automate the tasks performed during mitigation in order to reduce or eliminate future manual intervention.

There are two scripts that we will use to automate this work.

Script 1

identify-ec2-scheduled-events.sh Identifies the EC2 scheduled events using EC2 metadata IMDS v1. The container where the script is executed should have access to the EC2 metadata to retrieve information about EC2 events.

EVENT_TYPE is passed as an environment variable through a Shoreline Op command.

Script 2

retire-k8s-node.sh This script also uses AWS EC2 metadata to get region and instance-id ..etc. It uses kubectl command to drain a node and then uses aws cli binary to terminate EC2 instances.

Both the scripts use curl and jq to get EC2 metadata and parse it. These two scripts can be combined into one, but I choose to have two scripts to mix and match actions with some of my other alarms. The scripts are written in bash but can also be modified to use any other shell installed in your environment.

${ACTION_NODE_NAME},${grace_period},${timeout} environment variables are set as part of action execution.

Op commands to automate execution using a bot

The Op commands below assume the two scripts are loaded in the container at /scripts path in a Shoreline pod. Each Shoreline pod has a single container with AWS cli, kubectl, curl and jq installed on an Ubuntu base image.

I can select these containers running on Shoreline pods using op with Kubernetes' selector app="shoreline" on the pods and piping the output to container. I will create a resource called shoreline to make it easier to select the Shoreline container for subsequent Op statements.

The first action check_instance_event executes takes one input EVENT_TYPE that the script 1 expects. In the example, I am looking for “instance-retirement” type of Ec2 event.

If the EVENT_TYPE matches the “instance-retirement” when the script is executed then the script exits with exit status 1 which in turn triggers an alarm retirement_alarm.

The second action, retire_node, drains a node and then terminates the node using the AWS binary. This action takes two input parameters grace_period (grace period for pods to be evicted) and timeout (timeout for the kubectl command).

The alarm and action are attached together using a bot called retirement_bot. Whenever the retirement_alarm is triggered, then retire_node is triggered to drain the node.

Permission prerequisites:

The pod where the actions are run should have a service account with the following api permissions attached using role binding.

Permission required to run the AWS command

EC2:TerminateInstances This can be attached to the worker node role or can be attached to the pod where the action is being executed using IAM roles for Service accounts.

Conclusion

We can use this automation when an EC2 instance is running on degraded hardware. AWS will schedule maintenance of these nodes and will change the underlying host. Downtime is often involved in this process, but we can automate this process using Shoreline to prevent downtime and eliminate the need for manual intervention.

Ready to get started?

Shoreline helps you eliminate repetitive tickets and increase your availability at the same time. Get started today with a free trial.