Runbook

Inadequate Replication on Cassandra Cluster

Back to Runbooks

Overview

Inadequate replication refers to a situation where there is an insufficient replication factor or replication strategy implemented on a distributed system like a Cassandra cluster. This can result in data loss if one or more nodes fail. Without adequate replication, the data stored on the failed nodes cannot be retrieved, leading to a substantial data loss. This incident requires immediate attention to ensure that the replication factor and strategy are optimized to prevent data loss in the future.

Parameters

Debug

Check the number of nodes in the Cassandra cluster

Check the replication factor of the affected keyspace

Check the replication strategy of the affected keyspace

Check the status of the affected node(s)

Check the status of the affected data center(s)

Check the health of the Cassandra cluster

Check the consistency level of the affected queries

Repair

Increase the replication factor: Increase the number of replicas for each data center in the Cassandra cluster. This ensures that multiple copies of the data are stored across different nodes, which reduces the risk of data loss if a node fails.