Runbook

Gray Disk on Kafka Node.

Back to Runbooks

Overview

This incident type refers to the need to check for gray disks on a Kafka node. A gray disk is a disk that has failed or is failing, and it can cause data loss or interruption in the Kafka cluster. This check is important to ensure that the Kafka cluster is running smoothly and that data is being properly replicated across all nodes.

Parameters

Debug

Check if any disk is at critical threshold

Check if there are any disk errors

Check if there are any disk failures

Check if there are any Kafka log errors

Check the replication status of topics

Check the health of the Kafka cluster

Repair

Replace the gray disk with a new one.

Learn more

Related Runbooks

Check out these related runbooks to help you debug and resolve similar issues.