---
id: 3f1c6fc0-616e-11ee-8c99-0242ac120002
---

# Slow Disk in Cassandra Cluster
---

In this incident type, there is an issue with a Cassandra cluster where one or more disks are running slow. This can cause performance issues and potentially lead to data loss or downtime. The goal is to identify and address the specific disk(s) causing the problem in order to restore normal cluster operations.

### Parameters
```shell
export DISK_NAME="PLACEHOLDER"

export INTERVAL="PLACEHOLDER"

export COUNT="PLACEHOLDER"

export THRESHOLD="PLACEHOLDER"
```

## Debug

### Check disk usage and identify high usage disks
```shell
df -h
```

### Check disk I/O and identify slow disks
```shell
iostat -x ${DISK_NAME} ${INTERVAL} ${COUNT}
```

### Check disk read and write performance
```shell
hdparm -Tt ${DISK_NAME}
```

### Check for errors in the system log related to disk I/O
```shell
dmesg | grep ${DISK_NAME}
```

### Check for disk errors and bad sectors
```shell
smartctl -a /dev/${DISK_NAME}
```

### Check for file system errors and corruption
```shell
fsck /dev/${DISK_NAME}
```

## Repair

### Identify the specific disk(s) causing the issue by monitoring disk usage and performance metrics.
```shell


#!/bin/bash



# Set the threshold for high I/O usage

threshold=${THRESHOLD}



# Get a list of disks in use

disks=$(iostat -d | awk '/^sd/ {print $1}')



# Loop through the disks and check their I/O usage

for disk in $disks; do

    usage=$(iostat -d -p $disk | awk '/^sd/ {print $12}')

    if [ $usage -gt $threshold ]; then

        echo "Disk $disk is experiencing high I/O usage"

    fi

done





chmod +x identify_disk_usage.sh

./identify_disk_usage.sh


```


In this incident type, there is an issue with a Cassandra cluster where one or more disks are running slow. This can cause performance issues and potentially lead to data loss or downtime. The goal is to identify and address the specific disk(s) causing the problem in order to restore normal cluster operations.


This incident type refers to a problem in a Cassandra cluster where the token range imbalances cause uneven distribution of data across the cluster. This can result in slower read and write performance that can impact the overall functionality of the system. Token range imbalances occur when the distribution of the tokens that define the ranges of data each node is responsible for is not evenly spread across the cluster. As a result, certain nodes may be responsible for a disproportionate amount of data, leading to performance issues and potential failure of the system.


Token Range Imbalances Causing Uneven Data Distribution and Performance Issues in Cassandra Cluster

This incident type relates to identifying slow running queries on the Cassandra database and determining the users responsible for running them. Slow queries can cause performance issues and impact the overall efficiency of the system. Identifying and troubleshooting slow queries is crucial for maintaining optimal performance and ensuring smooth operations of the database. The incident may require investigating the root cause of the slow queries, optimizing the database configuration and queries, and providing recommendations to mitigate future incidents.


Slow Running Queries on Cassandra

This incident type refers to a situation where there is a significant delay in the execution of queries on a Cassandra cluster. This delay can cause the system to become unresponsive and result in slower performance. It may be caused by a variety of factors such as an increase in traffic, inefficient queries, or hardware issues. The issue can impact the functionality of the system and requires immediate attention to prevent further disruption.


Slow Query Performance on Cassandra Cluster.

This incident type refers to a situation where a delay or slowness occurs in a system that uses Cassandra database due to the shared storage. Shared storage means multiple servers are accessing the same storage unit, and this can cause latency issues. This type of incident can lead to performance degradation, and it needs to be addressed promptly to ensure optimal system performance.


Latency Caused by Shared Storage in Cassandra

This incident type refers to a situation where a high number of mutations are being dropped on a Cassandra database. Mutations are changes made to the database, such as inserting new data or updating existing data. When mutations are dropped, it means that they were not successfully recorded in the database. This can be caused by a variety of factors, such as hardware or network issues, configuration problems, or bugs in the software. When this occurs, it can result in data inconsistencies or loss, and can impact the performance and reliability of the application that relies on the database.


High Number of Dropped Mutations on Cassandra Database

```shell
export DISK_NAME="PLACEHOLDER"

export INTERVAL="PLACEHOLDER"

export COUNT="PLACEHOLDER"

export THRESHOLD="PLACEHOLDER"
```


### Check disk usage and identify high usage disks

```shell
df -h
```

### Check disk I/O and identify slow disks

```shell
iostat -x ${DISK_NAME} ${INTERVAL} ${COUNT}
```

### Check disk read and write performance

```shell
hdparm -Tt ${DISK_NAME}
```

### Check for errors in the system log related to disk I/O

```shell
dmesg | grep ${DISK_NAME}
```

### Check for disk errors and bad sectors

```shell
smartctl -a /dev/${DISK_NAME}
```

### Check for file system errors and corruption

```shell
fsck /dev/${DISK_NAME}
```


### Identify the specific disk(s) causing the issue by monitoring disk usage and performance metrics.

```shell


#!/bin/bash



# Set the threshold for high I/O usage

threshold=${THRESHOLD}



# Get a list of disks in use

disks=$(iostat -d | awk '/^sd/ {print $1}')



# Loop through the disks and check their I/O usage

for disk in $disks; do

    usage=$(iostat -d -p $disk | awk '/^sd/ {print $12}')

    if [ $usage -gt $threshold ]; then

        echo "Disk $disk is experiencing high I/O usage"

    fi

done





chmod +x identify_disk_usage.sh

./identify_disk_usage.sh


```


Slow Disk in Cassandra Cluster

Overview

Parameters

Debug

Check disk usage and identify high usage disks

Check disk I/O and identify slow disks

Check disk read and write performance

Check for disk errors and bad sectors

Check for file system errors and corruption

Repair

Identify the specific disk(s) causing the issue by monitoring disk usage and performance metrics.

Learn more

Related Runbooks

Token Range Imbalances Causing Uneven Data Distribution and Performance Issues in Cassandra Cluster

Slow Running Queries on Cassandra

Slow Query Performance on Cassandra Cluster.

Latency Caused by Shared Storage in Cassandra

Support