Runbook

Cassandra Coordinator Query Latency Causing Timeout

Back to Runbooks

Overview

This incident type refers to an issue where the coordinator node in a Cassandra database cluster experiences slow query latency, resulting in timeouts. The coordinator node is responsible for managing client connections and routing queries to the appropriate nodes in the cluster. If it is not able to process queries quickly enough, clients may experience timeouts and be unable to retrieve the data they need. This issue can be caused by a variety of factors, including high load on the cluster, network issues, or hardware problems.

Parameters

Debug

Check the status of the Cassandra cluster

List the Cassandra keyspaces to see if there are any issues with replication

Check the load on the Cassandra coordinator node

Check the network latency between nodes in the Cassandra cluster

View the Cassandra nodetool output to see if there are any issues with the cluster

Repair

Increase the capacity of the Cassandra cluster by adding more nodes to distribute the load and reduce query latency.

Learn more

Related Runbooks

Check out these related runbooks to help you debug and resolve similar issues.