Runbook

High error rate on NGINX incident

Back to Runbooks

Overview

The "High error rate on NGINX" incident type refers to a situation where the error rate on the NGINX server is above 1% for the last 5 minutes. This can result in degraded performance or downtime of the affected service, impacting user experience and potentially leading to lost revenue. The incident requires immediate attention and resolution to minimize the impact on users and prevent further damage.

Parameters

Debug

Check the status of the NGINX service

Check the error log for NGINX

Check the access log for NGINX

Check the NGINX configuration file for syntax errors

Check the NGINX configuration file for errors in the upstream server configuration

Check the system load average

Check the CPU usage of the NGINX process

Check the memory usage of the NGINX process

Check the network traffic on the NGINX server

Check the NGINX configuration for the maximum number of connections allowed

Repair

Define variables

Add more instances to serve increased load

Wait for the new instances to start running

Get the IDs of the new instances

Register the new instances with the target group

Learn more

Related Runbooks

Check out these related runbooks to help you debug and resolve similar issues.