Runbook

Kafka ZooKeeper Session Expiry Incident.

Back to Runbooks

Overview

This incident type refers to an issue with the Apache Kafka distributed streaming platform where the connection between the Kafka brokers and the ZooKeeper ensemble is interrupted due to the expiration of ZooKeeper session. This may result in temporary unavailability of Kafka brokers, leading to data loss or system downtime. The incident requires immediate attention and resolution to restore normal system operation.

Parameters

Debug

Check ZooKeeper status

Check Kafka status

Check ZooKeeper logs for errors

Check Kafka logs for errors

Check the ZooKeeper session timeout value

Check the Kafka broker configuration for ZooKeeper connection string

Check the Kafka topic configuration for replication factor

Check the Kafka consumer groups for active consumers

Check the Kafka consumer group lag for a specific topic

Repair

Increase the ZooKeeper session timeout value to avoid expirations.

Configure Kafka to automatically re-establish a connection when a ZooKeeper session expires.

Learn more

Related Runbooks

Check out these related runbooks to help you debug and resolve similar issues.