Runbook

Kafka Consumer Group Lag Incident

Back to Runbooks

Overview

The Kafka Consumer Group Lag incident refers to a situation where the lag time for a Kafka consumer group exceeds the expected threshold. This delay can result in delayed or lost data processing, leading to service degradation or failure.

Parameters

Debug

Find out the brokers in the Kafka cluster

Check the status of Kafka brokers and zookeeper

Check the consumer group status

Check the partition lag for the consumer group

Check topics metadata to see if any topics are unbalanced

Check the disk space usage on the Kafka brokers

Check the network connectivity between the brokers and the consumer group

Check the Zookeeper logs for any errors

Check the Kafka broker logs for any errors

One or more Kafka brokers in the cluster are down or experiencing high latency, causing the consumer group to fall behind in processing messages.

Repair

Increase the number of consumers to handle the load and reduce the lag.

Learn more

Related Runbooks

Check out these related runbooks to help you debug and resolve similar issues.