Runbook
Elasticsearch Instability and Cluster Failures
Back to Runbooks
Overview
This incident type refers to frequent or unexpected instability and cluster failures in Elasticsearch, which is a distributed search and analytics engine. These issues can impact the performance of the system, leading to downtime and potential data loss. The cause of these incidents can vary, including hardware failure, software bugs, network issues, or configuration errors. It is crucial to address these incidents quickly and efficiently to minimize the impact on the system and ensure its stability and reliability.
Parameters
Debug
Check Elasticsearch cluster health
Check Elasticsearch cluster state
Check Elasticsearch node stats
Check Elasticsearch node info
Check Elasticsearch index health
Check Elasticsearch index stats
Check Elasticsearch shard allocation
Check Elasticsearch logs for errors
Restart Elasticsearch service
Repair
Learn more
Related Runbooks
Check out these related runbooks to help you debug and resolve similar issues.