Runbook

Troubleshooting connectivity issues between tasks in an Amazon ECS cluster using service discovery

Back to Runbooks

Overview

This incident type involves connectivity issues between tasks in an Amazon ECS (Elastic Container Service) cluster that uses service discovery. Service discovery is a mechanism that allows services to be discovered and accessed by other services without needing to know their IP addresses. There are several potential areas to investigate when facing connectivity issues, including service discovery configuration, DNS resolution, task definition and network mode, security groups, task IAM role, VPC configuration, ECS agent, ECS service event messages, logs, application-level configuration, and health checks. Troubleshooting steps need to be taken to resolve these issues.

Parameters

Debug

Confirm that the ECS service is associated with a Service Discovery namespace

Check if the DNS records of the tasks are correctly registered in the AWS Cloud Map service

Review the task definition and check the network mode

Confirm that the tasks are launched in the expected subnets

Ensure health checks are correctly set up

Review security groups associated with the task or service to make sure inbound and outbound traffic is allowed between tasks

Repair

Associate the service with the service registries

Enable DNS resolution for the VPC

Learn more

Related Runbooks

Check out these related runbooks to help you debug and resolve similar issues.