This incident type refers to the need to check for gray disks on a Kafka node. A gray disk is a disk that has failed or is failing, and it can cause data loss or interruption in the Kafka cluster. This check is important to ensure that the Kafka cluster is running smoothly and that data is being properly replicated across all nodes.
Parameters
Debug
Check if any disk is at critical threshold
Check if there are any disk errors
Check if there are any disk failures
Check if there are any Kafka log errors
Check the replication status of topics
Check the health of the Kafka cluster
Repair
Replace the gray disk with a new one.
Learn more
Related Runbooks
Check out these related runbooks to help you debug and resolve similar issues.