The Kafka Consumer Group Lag incident refers to a situation where the lag time for a Kafka consumer group exceeds the expected threshold. This delay can result in delayed or lost data processing, leading to service degradation or failure.
Parameters
Debug
Find out the brokers in the Kafka cluster
Check the status of Kafka brokers and zookeeper
Check the consumer group status
Check the partition lag for the consumer group
Check topics metadata to see if any topics are unbalanced
Check the disk space usage on the Kafka brokers
Check the network connectivity between the brokers and the consumer group
Check the Zookeeper logs for any errors
Check the Kafka broker logs for any errors
One or more Kafka brokers in the cluster are down or experiencing high latency, causing the consumer group to fall behind in processing messages.
Repair
Increase the number of consumers to handle the load and reduce the lag.
Learn more
Related Runbooks
Check out these related runbooks to help you debug and resolve similar issues.