This incident type involves nodes in a Kubernetes cluster that are experiencing network unavailability, meaning they are not accessible. This could be due to a misconfiguration, route exhaustion, or a physical problem with the network connection to the hardware. It is a high urgency incident that requires immediate attention to restore network connectivity to the affected nodes.
Parameters
Debug
Check if Kubernetes nodes are available
Check the status of each Kubernetes node
Check the network configuration of each Kubernetes node
Check if there are any pods that are failing due to network issues
Check the status of the Kubernetes network components
Check if there are any network policies that could be blocking traffic
Check if there are any issues with the Kubernetes service
Check if there are any issues with the Kubernetes endpoint
Check if there are any issues with the Kubernetes ingress
Firewall or security group settings blocking network traffic on the affected nodes
Routing issues in the cluster
Repair
Check if the routing tables are correctly configured to ensure that the nodes can communicate with each other.
Check for any network security policies that may be blocking traffic between nodes and adjust them accordingly.
Learn more
Related Runbooks
Check out these related runbooks to help you debug and resolve similar issues.