---
id: 2db3e656-616d-11ee-8c99-0242ac120002
---

# Cassandra Coordinator Query Latency Causing Timeout
---

This incident type refers to an issue where the coordinator node in a Cassandra database cluster experiences slow query latency, resulting in timeouts. The coordinator node is responsible for managing client connections and routing queries to the appropriate nodes in the cluster. If it is not able to process queries quickly enough, clients may experience timeouts and be unable to retrieve the data they need. This issue can be caused by a variety of factors, including high load on the cluster, network issues, or hardware problems.

### Parameters
```shell
export KEYSPACE="PLACEHOLDER"

export TABLE="PLACEHOLDER"

export PARTITION_KEY="PLACEHOLDER"

export NODETOOL_COMMAND="PLACEHOLDER"

export PATH_TO_CASSANDRA_CONF="PLACEHOLDER"

export NEW_NODE_IP="PLACEHOLDER"

export SEED_NODE_IP="PLACEHOLDER"
```

## Debug

### Check the status of the Cassandra cluster
```shell
nodetool status
```

### List the Cassandra keyspaces to see if there are any issues with replication
```shell
cqlsh -e "DESC KEYSPACES"
```

### View the Cassandra system log to look for errors related to the coordinator node
```shell
tail -n 100 /var/log/cassandra/system.log
```

### Check the load on the Cassandra coordinator node
```shell
nodetool tpstats | grep "coordinator"
```

### Check the network latency between nodes in the Cassandra cluster
```shell
nodetool getendpoints ${KEYSPACE} ${TABLE} ${PARTITION_KEY}
```

### View the Cassandra nodetool output to see if there are any issues with the cluster
```shell
nodetool ${NODETOOL_COMMAND}
```

## Repair

### Increase the capacity of the Cassandra cluster by adding more nodes to distribute the load and reduce query latency.
```shell
bash

#!/bin/bash



# Set variables

CASSANDRA_CONF=${PATH_TO_CASSANDRA_CONF}

NEW_NODE_IP=${NEW_NODE_IP}

SEED_NODE_IP=${SEED_NODE_IP}



# Stop Cassandra

sudo service cassandra stop



# Add new node to Cassandra configuration

echo "auto_bootstrap: false" >> $CASSANDRA_CONF/cassandra.yaml

echo "seed_provider:" >> $CASSANDRA_CONF/cassandra.yaml

echo "  - class_name: org.apache.cassandra.locator.SimpleSeedProvider" >> $CASSANDRA_CONF/cassandra.yaml

echo "    parameters:" >> $CASSANDRA_CONF/cassandra.yaml

echo "      - seeds: \"$SEED_NODE_IP,$NEW_NODE_IP\"" >> $CASSANDRA_CONF/cassandra.yaml



# Start new node

sudo service cassandra start



# Check cluster status

nodetool status


```


This incident type refers to an issue where the coordinator node in a Cassandra database cluster experiences slow query latency, resulting in timeouts. The coordinator node is responsible for managing client connections and routing queries to the appropriate nodes in the cluster. If it is not able to process queries quickly enough, clients may experience timeouts and be unable to retrieve the data they need. This issue can be caused by a variety of factors, including high load on the cluster, network issues, or hardware problems.


This incident type refers to a scenario where there is a high rate of statement timeouts in a Postgresql database instance. This can lead to degraded performance and potentially impact the availability of the database. It is important to quickly identify and address the underlying cause of the timeouts to ensure the stability of the system.


Postgresql high rate statement timeout incident.

This incident type relates to identifying slow running queries on the Cassandra database and determining the users responsible for running them. Slow queries can cause performance issues and impact the overall efficiency of the system. Identifying and troubleshooting slow queries is crucial for maintaining optimal performance and ensuring smooth operations of the database. The incident may require investigating the root cause of the slow queries, optimizing the database configuration and queries, and providing recommendations to mitigate future incidents.


Slow Running Queries on Cassandra

This incident type refers to a situation where there is a significant delay in the execution of queries on a Cassandra cluster. This delay can cause the system to become unresponsive and result in slower performance. It may be caused by a variety of factors such as an increase in traffic, inefficient queries, or hardware issues. The issue can impact the functionality of the system and requires immediate attention to prevent further disruption.


Slow Query Performance on Cassandra Cluster.

In this incident type, there is an issue with a Cassandra cluster where one or more disks are running slow. This can cause performance issues and potentially lead to data loss or downtime. The goal is to identify and address the specific disk(s) causing the problem in order to restore normal cluster operations.


Slow Disk in Cassandra Cluster

This incident type refers to a situation where a delay or slowness occurs in a system that uses Cassandra database due to the shared storage. Shared storage means multiple servers are accessing the same storage unit, and this can cause latency issues. This type of incident can lead to performance degradation, and it needs to be addressed promptly to ensure optimal system performance.


Latency Caused by Shared Storage in Cassandra

```shell
export KEYSPACE="PLACEHOLDER"

export TABLE="PLACEHOLDER"

export PARTITION_KEY="PLACEHOLDER"

export NODETOOL_COMMAND="PLACEHOLDER"

export PATH_TO_CASSANDRA_CONF="PLACEHOLDER"

export NEW_NODE_IP="PLACEHOLDER"

export SEED_NODE_IP="PLACEHOLDER"
```


### Check the status of the Cassandra cluster

```shell
nodetool status
```

### List the Cassandra keyspaces to see if there are any issues with replication

```shell
cqlsh -e "DESC KEYSPACES"
```

### View the Cassandra system log to look for errors related to the coordinator node

```shell
tail -n 100 /var/log/cassandra/system.log
```

### Check the load on the Cassandra coordinator node

```shell
nodetool tpstats | grep "coordinator"
```

### Check the network latency between nodes in the Cassandra cluster

```shell
nodetool getendpoints ${KEYSPACE} ${TABLE} ${PARTITION_KEY}
```

### View the Cassandra nodetool output to see if there are any issues with the cluster

```shell
nodetool ${NODETOOL_COMMAND}
```


### Increase the capacity of the Cassandra cluster by adding more nodes to distribute the load and reduce query latency.

```shell
bash

#!/bin/bash



# Set variables

CASSANDRA_CONF=${PATH_TO_CASSANDRA_CONF}

NEW_NODE_IP=${NEW_NODE_IP}

SEED_NODE_IP=${SEED_NODE_IP}



# Stop Cassandra

sudo service cassandra stop



# Add new node to Cassandra configuration

echo "auto_bootstrap: false" >> $CASSANDRA_CONF/cassandra.yaml

echo "seed_provider:" >> $CASSANDRA_CONF/cassandra.yaml

echo "  - class_name: org.apache.cassandra.locator.SimpleSeedProvider" >> $CASSANDRA_CONF/cassandra.yaml

echo "    parameters:" >> $CASSANDRA_CONF/cassandra.yaml

echo "      - seeds: \"$SEED_NODE_IP,$NEW_NODE_IP\"" >> $CASSANDRA_CONF/cassandra.yaml



# Start new node

sudo service cassandra start



# Check cluster status

nodetool status


```


Cassandra Coordinator Query Latency Causing Timeout

Overview

Parameters

Debug

Check the status of the Cassandra cluster

List the Cassandra keyspaces to see if there are any issues with replication

Check the load on the Cassandra coordinator node

Check the network latency between nodes in the Cassandra cluster

View the Cassandra nodetool output to see if there are any issues with the cluster

Repair

Increase the capacity of the Cassandra cluster by adding more nodes to distribute the load and reduce query latency.

Learn more

Related Runbooks

Postgresql high rate statement timeout incident.

Slow Running Queries on Cassandra

Slow Query Performance on Cassandra Cluster.

Slow Disk in Cassandra Cluster

Support