Runbook

Cassandra Tombstone Dump Incident

Back to Runbooks

Overview

A Cassandra tombstone dump incident refers to a situation in which a database table in Cassandra has too many tombstones (deleted data markers), causing performance issues and potentially leading to data loss. This type of incident requires immediate attention from a software engineer as it can negatively impact the overall system's stability and availability. The incident may be caused by a variety of factors, such as a misconfigured garbage collector or an application that is generating too many tombstones.

Parameters

Debug

Check Cassandra's status

Check for any errors in the Cassandra system log

Check if any tombstone threshold has been exceeded

Check the number of tombstones per partition

Check the size of the tombstone files on disk

Check the garbage collector logs for any errors

Check Cassandra's configuration file for any misconfigurations

Check if any nodes in the Cassandra cluster are down

Misconfigured garbage collector: If the garbage collector in Cassandra is misconfigured, it may not be cleaning up tombstones effectively, leading to an accumulation of tombstones that can impact performance and stability.

Repair

Set the path to the Cassandra configuration file

Set the name of the garbage collector to use

Set the options for the garbage collector

Backup the original configuration file

Modify the garbage collector settings in the configuration file

Restart the Cassandra service to apply the changes

Learn more

Related Runbooks

Check out these related runbooks to help you debug and resolve similar issues.