Runbook

Overloading of Resources in Apache Airflow

Back to Runbooks

Overview

This incident type involves managing the level of parallelism and concurrency for tasks in Apache Airflow to prevent overloading of system resources. Apache Airflow is a platform used for scheduling and monitoring workflows. When the level of parallelism or concurrency is too high, it can cause resource exhaustion, leading to failure of tasks or even the entire system. This incident requires careful management of the number of tasks that can run simultaneously and the resources they require to ensure smooth performance of the system.

Parameters

Debug

List all running pods in the Airflow namespace

View the logs of a specific pod

Check the status of the Airflow scheduler

Check the resource usage of a specific pod

Check the resource usage of all pods in the Airflow namespace

Check the resource requests and limits for a specific pod

Check the resource requests and limits for all pods in the Airflow namespace

Repair

Review the current parallelism and concurrency settings in Apache Airflow and adjust them to a level that can be supported by the available system resources.

Learn more

Related Runbooks

Check out these related runbooks to help you debug and resolve similar issues.