Kubernetes is an amazing way to deploy stateless applications. For example, it’s easy to scale up and scale down replicas of your application using a deployment.
However, stateful applications that involve disks (e.g., mysql, rabbitmq, prometheus, elasticsearch) are different. When a disk fills up, you can’t just create a copy, as the copied disk will also be full.
You often need to resize the disk, but how do you do this in Kubernetes? There are some challenges here:
How do we make sure the current data is intact?
How do we do this without corrupting the data as the application continues to write?
Below, we’ll answer these questions and discuss how to automatically detect and fix this problem.
After reading this post you’ll learn:
The resources involved in Kubernetes volume management.
How to resize a disk, including errors and failures you’ll need to handle
How to automate detecting a filling disk and then trigger a resize using cloudwatch alarms and AWS Lambda
Optionally, how you can automate this with Shoreline
Resizing disks with code
Let’s start with a brief discussion of the quirks of Kubernetes volume management. Something to note is the confusing nature of the resource names PersistentVolumeClaim (PVC) and PersistentVolume (PV). Note that neither is called a disk. What do these terms mean?
PersistentVolumeClaims declare the hard drive requirements for a Pod. For example, my Pod may declare that it needs a 20 GB drive.
Let’s list all of the PersistentVolumeClaims in our cluster:
Above, we can see each of PersistentVolumeClaims, their namespace, their capacity, and storage class. Note that all of the above have a status of “Bound”.
Kubernetes will fulfill this request by provisioning a PersistentVolume. When a PersistentVolumeClaim has a PersistentVolume, it is bound to that volume.
PersistentVolume represents the true volume that is connected to your Pod. It’s where the data really goes.
Here’s how to list your PersistentVolumes:
AllowVolumeExpansion must be set to true
AllowVolumeExpansion is a big gotcha when configuring volumes in Kubernetes. Note above that every PersistentVolumeClaim has a StorageClass, which defines key parameters for a type of volume such as performance. The key parameter that we need to check in this situation is allowVolumeExpansion. (Note: Volume expansion needs at least Kubernetes version 1.11 otherwise it is not allowed at all.)
If it’s false, then we can’t resize the volume. We need to make sure it is set to true. Here’s how to do that:
Note that AllowVolumeExpansion for gp2 is unset. This is actually very common. Let’s set it to true.
Also note that the underlying driver needs to support volume expansion. For example, Amazon EBS eventually added expansion support, called elastic volumes.
You can check if your driver supports expansion in this table.
Patching the PVC to change its size
Now let’s get to the matter at hand - resizing. To that, we need to adjust the PersistentVolumeClaim by changing its capacity. Once we do that, this will trigger a resize of the PersistentVolume bound to the claim.
This process is just like everything else in Kubernetes - we tell Kubernetes what we want, not how to do it. Underneath, Kubernetes will reach out the volume driver to resize the disk. In this case, that’s going to be the EBS API.
Let’s increase the volume of our Prometheus server from 16 GB to 32 GB:
Let’s see if the resize event has gone through. Note, that you might not see anything. You’ll have to run this command in a loop - it’s asynchronous.
As you can see, our disk was resized! Let’s do one final check to confirm it’s 32 GB.
Resize limitations and handling failures
Since resizing is an async operation, we don’t know how long it will take. The larger the disk and/or change in disk size, the longer the resize can take.That’s why we might need to keep checking for events that the resize has taken place.
Furthermore, we are limited in terms of how often we can resize. Amazon EC2 documentation states, “you must wait at least six hours and ensure that the volume is in the in-use or available state before you can modify the same volume.” If you hit this issue, you’ll get an error from AWS and then an error in Kubernetes. Also note that not all file systems support resize. Only XFS, Ext3, or Ext4 support automatic resize.
Automating detection and resize using CW Alarms + Lambda
Why automate? As most ops folks know, manually resizing a disk does not make sense in many situations. This creates a lot of work for the team, and can hurt the end user/customer experience. Here is an example of how to automate the work we did previously by using a scheduled CloudWatch event and a Lambda.
First let’s create a shell script that computes disk usage
Note we’ll need to specify which disk we want to go after
Then let’s augment it to iterate over each of our pods
Then let’s add an if condition that if the disk is too full, we adjust the pvc to trigger a resize
We’ll also need to loop over the result and keep checking b/c it’s an async call.
We’ll also need to handle errors.
Next, let’s put this in a docker container. Follow this tutorial to wrap into the AWS Lambda container interface and build your container image. Set up an ECR repository and push your container to the repository. Follow this tutorial to create an AWS Lambda event that triggers a lambda function.
Unfortunately, this still doesn’t execute in parallel (i.e., this will get slower as you manage more containers). The other ramification of this is that your time to check between each container will continuously grow. In addition, you’ll need to integrate the container with your secrets propagation service. Worst case you can build your secrets into the container, but that’s insecure and will fail if you adjust your secrets. This is particularly important because you need to get your kubectl credentials to the Lambda.
Automate detection and resize with Shoreline
In our example above, we only resized a volume one time. But going forward, how do we know when another resize is necessary? This is actually difficult to determine; often the demands on our databases or other stateful systems are dynamic. We don’t know when we’ll get a traffic spike.
That means that we’ll have to constantly monitor. When a resize is needed, we’ll then need to go through the above operations and handle any errors that come up. We’ll need a runbook for this.
All of the issues we’ve just discussed are things that Shoreline has worked to address as a part of our platform. Within our platform we have Op Packs – prebuilt automations that will give you the metrics that you're going to track, the alarms that will fire, the actions that can be taken, and the scripts that will be run by the actions. You'll have the option of running these Op Packs manually on the shoreline platform, or turning on the automated version once you're comfortable.
Shoreline’s disk Op Pack automatically detects the need for a resize and kicks off the process. We can even use Terraform to configure the Op Pack. We’ll set it up to monitor all of our Prometheus servers, and when any of the disks get to 80% full, automatically add 10 GB of storage, up to 200 GB. It runs on GCP and AWS.
We just need to do a terraform apply to store this configuration into the cluster:
This uses our verified terraform provider. With that configuration, Shoreline will install local control loops on each node, continuously searching for full disks. If a full disk is detected, the resize is automatically started.
We handle the asynchronous nature of the request and any failures that arise along the way. You won’t see this ticket again. Similarly, this can be applied to every stateful application you have - the disk resize issue is squashed across the fleet, in 8 lines of terraform.