AWS ECS Development Rules
6/30/2025
本规则涵盖AWS ECS开发,包含将基础设施视为代码、确保幂等性等通用原则。详细说明了代码组织、常见模式、性能优化、安全、测试和工具使用。比如建议使用Terraform管理基础设施、采用微服务架构和自动化CI/CD管道。
description: This rule covers best practices for developing, deploying, and maintaining applications using AWS Elastic Container Service (ECS). It includes guidance on code organization, performance, security, testing, and common pitfalls.
globs: *.tf,*.yml,*.yaml,*.json,*.sh,*.dockerfile
---
- **General Principles:**
- **Infrastructure as Code (IaC):** Treat your ECS infrastructure as code. Use tools like Terraform, AWS CloudFormation, or AWS CDK to define and manage your ECS clusters, task definitions, and services.
- **Configuration Management:** Externalize configuration from your application code. Use environment variables, AWS Secrets Manager, or AWS Systems Manager Parameter Store to manage configuration data.
- **Idempotency:** Ensure that your deployment scripts and automation are idempotent. This means that running the same script multiple times should have the same result as running it once.
- **Automation:** Automate all aspects of your ECS deployment, from building container images to deploying services and scaling resources.
- **1. Code Organization and Structure:**
- **1.1 Directory Structure:**
- **Project Root:** The base directory containing all project-related files.
- `infrastructure/`: Contains IaC code (Terraform, CloudFormation) for ECS clusters, task definitions, services, VPC, etc.
- `application/`: Contains source code for your containerized application.
- `src/`: Application source code.
- `tests/`: Unit, integration, and end-to-end tests.
- `Dockerfile`: Dockerfile for building the application container image.
- `scripts/`: Contains deployment, build, and other utility scripts.
- `docs/`: Documentation for the project.
- `README.md`: Project README file.
- **1.2 File Naming Conventions:**
- **Terraform:** `main.tf`, `variables.tf`, `outputs.tf`, `<module_name>.tf`
- **CloudFormation:** `<stack_name>.yml` or `<stack_name>.yaml`
- **Docker:** `Dockerfile` (no extension), `.dockerignore`
- **Scripts:** `<script_name>.sh` (Bash), `<script_name>.py` (Python)
- **Configuration:** `config.yml`, `config.json`, `.env`
- **1.3 Module Organization:**
- **Terraform Modules:** Create reusable Terraform modules for common ECS infrastructure components (e.g., ECS cluster, task definition, load balancer).
- **Application Modules:** Organize your application code into logical modules based on functionality (e.g., authentication, API, database access).
- Use separate git repositories for independent services/applications, and ECS services accordingly. This avoids monolithic deployments and allows independent scaling and versioning.
- **1.4 Component Architecture:**
- **Microservices:** Design your application as a collection of microservices, each deployed as an independent ECS service.
- **API Gateway:** Use an API gateway (e.g., Amazon API Gateway) to route requests to different ECS services.
- **Message Queue:** Use a message queue (e.g., Amazon SQS, Amazon MQ) for asynchronous communication between services.
- **Database:** Use a managed database service (e.g., Amazon RDS, Amazon DynamoDB) for data persistence. Ensure proper security and access control configurations.
- **1.5 Code Splitting:**
- Break down large applications or services into smaller, more manageable components that can be deployed independently.
- Optimize container images to only include code required for each specific component.
- Utilize ECS services to deploy each component separately, enabling independent scaling and upgrades.
- **2. Common Patterns and Anti-patterns:**
- **2.1 Design Patterns:**
- **Sidecar:** Use sidecar containers (e.g., for logging, monitoring, or service mesh) to add functionality to your main application container without modifying the application code. Use a separate container in the same task definition.
- **Backend for Frontend (BFF):** Create a BFF layer that provides a specific API for each client application, improving performance and security.
- **Strangler Fig:** Gradually migrate a legacy application to ECS by introducing new microservices and slowly replacing the old functionality.
- **Aggregator Pattern:** Use an aggregator service to combine data from multiple backend services into a single response.
- **2.2 Recommended Approaches:**
- **Health Checks:** Implement robust health checks for your containers and configure ECS to use them. Use both `HEALTHCHECK` in the `Dockerfile` for initial container health and ECS health checks (ELB health checks) for service availability.
- **Service Discovery:** Use service discovery (e.g., AWS Cloud Map) to enable services to find each other dynamically.
- **Load Balancing:** Use a load balancer (e.g., Application Load Balancer, Network Load Balancer) to distribute traffic across multiple ECS tasks.
- **Auto Scaling:** Configure auto scaling to automatically adjust the number of ECS tasks based on demand. Use CloudWatch metrics (CPU, memory, request count) as scaling triggers.
- **Immutable Infrastructure:** Avoid making changes to running containers. Instead, redeploy a new container image with the changes.
- **Use ECS Exec for debugging** avoid SSH into running EC2 instances
- **2.3 Anti-patterns:**
- **Hardcoding Configuration:** Avoid hardcoding configuration values in your application code.
- **Large Container Images:** Keep your container images small by using multi-stage builds and removing unnecessary dependencies.
- **Ignoring Security Best Practices:** Neglecting security best practices can lead to vulnerabilities. Always follow security best practices for container images, task definitions, and IAM roles.
- **Manual Scaling:** Manually scaling ECS tasks is inefficient and error-prone. Use auto scaling instead.
- **Monolithic Containers:** Avoid creating single containers that run multiple applications. Break them down into smaller, single-purpose containers.
- **2.4 State Management:**
- **Stateless Applications:** Design your applications to be stateless whenever possible. Store state in a database or other external storage service.
- **Persistent Storage:** Use persistent storage volumes (e.g., Amazon EFS, Amazon EBS) for stateful applications.
- **Container Lifecycle:** Understand the container lifecycle and how ECS manages container restarts and replacements.
- **2.5 Error Handling:**
- **Logging:** Implement comprehensive logging and monitoring for your applications. Log to stdout/stderr, and use a logging driver (e.g., FireLens) to ship logs to a central logging service (e.g., CloudWatch Logs, Splunk).
- **Exception Handling:** Implement proper exception handling in your application code.
- **Retry Logic:** Implement retry logic for transient errors.
- **Dead Letter Queues:** Use dead letter queues (DLQs) to handle messages that cannot be processed after multiple retries.
- **3. Performance Considerations:**
- **3.1 Optimization Techniques:**
- **Resource Allocation:** Properly size your ECS tasks based on the application's resource requirements (CPU, memory). Monitor resource utilization and adjust task sizes as needed.
- **Container Image Optimization:** Optimize container images by minimizing size, using efficient base images, and leveraging layer caching. Use tools like `docker image prune` to remove unused images.
- **Load Balancing:** Configure load balancing algorithms and connection draining to optimize traffic distribution and minimize downtime.
- **Caching:** Implement caching at various levels (e.g., application, CDN) to reduce latency and improve performance.
- **Connection Pooling:** Reuse database connections to minimize overhead.
- **3.2 Memory Management:**
- **Memory Limits:** Set appropriate memory limits for your containers. Monitor memory usage and adjust limits to prevent out-of-memory errors.
- **Garbage Collection:** Optimize garbage collection settings for your application's runtime environment.
- **Memory Leaks:** Identify and fix memory leaks in your application code.
- **3.3 Bundle Size Optimization:**
- **Code Splitting:** Split your application code into smaller bundles to reduce the initial load time. Utilize dynamic imports to only load code when it's needed.
- **Tree Shaking:** Remove unused code from your application bundle.
- **Minification:** Minify your application code to reduce its size.
- **Compression:** Compress your application code to reduce its size.
- **3.4 Lazy Loading:**
- Load resources (e.g., images, data) only when they are needed.
- Use lazy-loading techniques to improve the initial load time of your application.
- **4. Security Best Practices:**
- **4.1 Common Vulnerabilities:**
- **Container Image Vulnerabilities:** Unpatched vulnerabilities in container images can be exploited by attackers. Regularly scan container images for vulnerabilities and apply patches.
- **IAM Role Misconfiguration:** Overly permissive IAM roles can allow attackers to access sensitive resources. Follow the principle of least privilege and grant only the necessary permissions.
- **Network Security Misconfiguration:** Misconfigured network security can expose your ECS services to unauthorized access. Use security groups and network ACLs to restrict network access.
- **Secrets Management Vulnerabilities:** Storing secrets in plain text can expose them to attackers. Use a secrets management service (e.g., AWS Secrets Manager) to securely store and manage secrets.
- **4.2 Input Validation:**
- Validate all input data to prevent injection attacks (e.g., SQL injection, command injection).
- Use a framework or library to perform input validation.
- **4.3 Authentication and Authorization:**
- Implement authentication and authorization to control access to your ECS services.
- Use a standard authentication protocol (e.g., OAuth 2.0, OpenID Connect).
- Use fine-grained authorization policies to restrict access to specific resources.
- **4.4 Data Protection:**
- Encrypt sensitive data at rest and in transit.
- Use HTTPS to encrypt data in transit.
- Use AWS KMS to encrypt data at rest.
- Implement data masking and tokenization to protect sensitive data.
- **4.5 Secure API Communication:**
- Use HTTPS for all API communication.
- Implement authentication and authorization for all API endpoints.
- Validate all input data to prevent injection attacks.
- Protect against Cross-Site Request Forgery (CSRF) attacks.
- **5. Testing Approaches:**
- **5.1 Unit Testing:**
- Write unit tests for individual components of your application.
- Use a unit testing framework (e.g., JUnit, pytest).
- Mock external dependencies to isolate the component being tested.
- **5.2 Integration Testing:**
- Write integration tests to verify the interaction between different components of your application.
- Use a test environment that is similar to the production environment.
- Test the integration with external services (e.g., databases, message queues).
- **5.3 End-to-End Testing:**
- Write end-to-end tests to verify the functionality of the entire application.
- Use an end-to-end testing framework (e.g., Selenium, Cypress).
- Test the application from the user's perspective.
- **5.4 Test Organization:**
- Organize your tests into a logical directory structure.
- Use meaningful names for your test files and test methods.
- Use a test runner to execute your tests.
- **5.5 Mocking and Stubbing:**
- Use mocking and stubbing to isolate components during testing.
- Use a mocking framework (e.g., Mockito, EasyMock).
- Create mock objects that simulate the behavior of external dependencies.
- **6. Common Pitfalls and Gotchas:**
- **6.1 Frequent Mistakes:**
- **Incorrect IAM Permissions:** Granting insufficient or excessive IAM permissions can lead to security vulnerabilities or application failures.
- **Misconfigured Network Settings:** Misconfigured network settings can prevent containers from communicating with each other or with external services.
- **Insufficient Resource Limits:** Setting insufficient resource limits (CPU, memory) can cause containers to crash or become unresponsive.
- **Ignoring Health Checks:** Ignoring health checks can lead to ECS tasks being considered healthy even when they are not.
- **Failing to handle SIGTERM signals:** Applications in containers must gracefully handle SIGTERM signals to allow ECS to shutdown tasks cleanly.
- **6.2 Edge Cases:**
- **Task Placement Constraints:** Be aware of task placement constraints (e.g., availability zone, instance type) and how they can affect task scheduling.
- **Service Discovery Limitations:** Understand the limitations of service discovery and how it can affect service resolution.
- **Load Balancer Connection Draining:** Be aware of load balancer connection draining and how it can affect application availability during deployments.
- **6.3 Version-Specific Issues:**
- Stay up-to-date with the latest ECS agent and Docker versions to avoid known issues and security vulnerabilities.
- Check the AWS documentation for any version-specific issues or known bugs.
- **6.4 Compatibility Concerns:**
- Be aware of compatibility issues between ECS and other AWS services (e.g., VPC, IAM, CloudWatch).
- Test your application with different versions of these services to ensure compatibility.
- **6.5 Debugging Strategies:**
- **Logging:** Use comprehensive logging to track application behavior and identify errors.
- **Monitoring:** Use monitoring tools (e.g., CloudWatch) to track resource utilization and application performance.
- **Debugging Tools:** Use debugging tools (e.g., Docker exec, AWS Systems Manager Session Manager) to troubleshoot running containers.
- **ECS Exec:** Utilize the ECS Exec feature to directly connect to containers for debugging purposes.
- **7. Tooling and Environment:**
- **7.1 Recommended Tools:**
- **Terraform:** For infrastructure as code.
- **AWS CLI:** For interacting with AWS services from the command line.
- **Docker:** For building and managing container images.
- **AWS SAM CLI:** For local development of serverless applications.
- **Visual Studio Code (VS Code):** With AWS and Docker extensions for development and debugging.
- **cdk8s:** Define Kubernetes applications and ECS using general purpose programming languages.
- **7.2 Build Configuration:**
- **Makefile:** Use a Makefile to automate build, test, and deployment tasks.
- **Build Scripts:** Use build scripts to customize the build process.
- **CI/CD Pipeline:** Integrate your build process with a CI/CD pipeline (e.g., AWS CodePipeline, Jenkins).
- **7.3 Linting and Formatting:**
- **Linters:** Use linters (e.g., ESLint, Pylint) to enforce code style and identify potential errors.
- **Formatters:** Use formatters (e.g., Prettier, Black) to automatically format your code.
- **Editor Integration:** Integrate linters and formatters with your code editor.
- **7.4 Deployment:**
- **Blue/Green Deployments:** Use blue/green deployments to minimize downtime during deployments.
- **Canary Deployments:** Use canary deployments to gradually roll out new versions of your application.
- **Rolling Updates:** Use rolling updates to gradually update ECS tasks with the new version.
- **7.5 CI/CD Integration:**
- **Automated Builds:** Automate the build process using a CI/CD pipeline.
- **Automated Tests:** Run automated tests as part of the CI/CD pipeline.
- **Automated Deployments:** Automate the deployment process using a CI/CD pipeline.