This incident type refers to an issue where the network bandwidth on one or more Kafka nodes becomes fully saturated, causing the node(s) to experience performance degradation or even failure. This can result in message delivery delays or loss, and can also impact other nodes that rely on the affected node(s) for message replication. The saturation can be caused by a variety of factors, such as increased message traffic, misconfiguration, or hardware issues.
Parameters
Debug
Check network interface statistics
Check network connections
Check network bandwidth usage for each process
Check network latency between nodes
Check network throughput between nodes
Check Kafka node status
Check Kafka logs
Check Kafka node configuration
Repair
Optimize network configuration: This could involve tuning network settings on the nodes to better handle the traffic or changing the network topology to reduce congestion.
Learn more
Related Runbooks
Check out these related runbooks to help you debug and resolve similar issues.