AWS Cloud Design Patterns: Architecture and Practical Examples
#aws#cloud-architecture#design-patterns#system-design#devops
AWS Cloud Design Patterns are reusable solutions for common problems when you build systems on AWS: high availability, scaling, storage, batch processing, and operations. Instead of starting from a blank page, you combine patterns like building blocks.
The original catalogue of patterns is maintained at AWS Cloud Design Pattern. This post:
- Lists the full pattern catalogue (grouped as on the reference site) with links to each pattern page.
- Provides original, high-level descriptions and reference architectures for several of the most practically useful patterns.
When studying or designing systems, you can use this as a quick companion to the official pattern pages.
0. Pattern catalogue (with links)
This section mirrors the structure of the AWS Cloud Design Pattern site and links to each individual pattern for deeper reading.
0.1 Basic Patterns
- Snapshot Pattern (Data Backups)
- Stamp Pattern (Server Replication)
- Scale Up Pattern (Dynamic Server Spec Up/Down)
- Scale Out Pattern (Dynamically Increasing the Number of Servers)
- On-demand Disk Pattern (Dynamically Increasing/Decreasing Disk Capacity)
0.2 Patterns for High Availability
- Multi-Server Pattern (Server Redundancy)
- Multi-Datacenter Pattern (Redundancy on the Data Center Level)
- Floating IP Pattern (Floating IP Address)
- Deep Health Check Pattern (System Health Check)
0.3 Patterns for Processing Dynamic Content
- Clone Server Pattern (Cloning a Server)
- NFS Sharing Pattern (Using Shared Content) +- NFS Replica Pattern (Replicating Shared Content)
- State Sharing Pattern (Sharing State Information)
- URL Rewriting Pattern (Saving Static Content)
- Rewrite Proxy Pattern (Proxy Setup for URL Overwriting)
- Cache Proxy Pattern (Cache Provisioning)
- Scheduled Scale Out Pattern (Increasing or Decreasing the Number of Servers Following a Schedule)
0.4 Patterns for Processing Static Content
- Web Storage Pattern (Use of High-Availability Internet Storage)
- Direct Hosting Pattern (Direct Hosting Using Internet Storage)
- Private Distribution Pattern (Data Delivery to Specified Users)
- Cache Distribution Pattern (Locating Data in a Location That Is Physically Near to the User)
- Rename Distribution Pattern (Delivery Without Update Delay)
0.5 Patterns for Uploading Data
- Write Proxy Pattern (High-Speed Uploading to Internet Storage)
- Storage Index Pattern (Increasing the Efficiency of Internet Storage)
- Direct Object Upload Pattern (Simplifying the Upload Procedure)
0.6 Patterns for Relational Database
- DB Replication Pattern (Replicating Online Databases)
- Read Replica Pattern (Load Distribution through Read Replicas)
- Inmemory DB Cache Pattern (Caching High-Frequency Data)
- Sharding Write Pattern (Improving Efficiency in Writing)
0.7 Patterns for Batch Processing
- Queuing Chain Pattern (Loose-Coupling of Systems)
- Priority Queue pattern (Changing Priorities)
- Job Observer Pattern (Job Monitoring and Adding/Deleting Servers)
- Scheduled Autoscaling Pattern (Turning Batch Servers On and Off Automatically)
0.8 Pattern for Operation and Maintenance
- Bootstrap Pattern (Automatic Acquisition of Startup Settings)
- Cloud DI Pattern (External Placement of Parts That Are Frequently Updated)
- Stack Deployment Pattern (Creating a Template for Setting up Groups of Servers)
- Server Swapping Pattern (Transferring Servers)
- Monitoring Integration Pattern (Centralization of Monitoring Tools)
- Web Storage Archive Pattern (Archiving Large Volumes of Data)
- Weighted Transition Pattern (Transitioning Using a Weighted Round Robin DNS)
- Hybrid Backup Pattern (Using the Cloud for Backups)
0.9 Patterns for Network
- OnDemand NAT Pattern (Changing Internet Settings at the Time of Maintenance)
- Backnet Pattern (Establishment of a Management Network)
- Functional Firewall Pattern (Multi-Tier Access Control)
- Operational Firewall Pattern (Controlling Access by Individual Function)
- Multi Load Balancer Pattern (Setting Up Multiple Load Balancers)
- WAF Proxy Pattern (Effective Use of a Costly Web Application Firewalls)
- CloudHub Pattern (Setting Up VPN Sites)
This post then focuses on a subset of these patterns and gives original summaries and reference architectures for them.
1. Basic patterns
1.1 Snapshot Pattern (Data Backups)
Problem
You need consistent, low‑impact backups of data (volumes, databases) with fast restore options.
Architecture
Application -> EBS volumes / RDS instance
|
v
Snapshots (stored in S3, managed by AWS)
|
Backup plans / lifecycle policies (retention, copy)
- Primary data lives on EBS volumes, RDS, or EFS.
- Snapshots are stored in S3 (managed by AWS) and can be copied across regions/accounts.
Implementation
- EBS volumes
- Use
CreateSnapshot(CLI/SDK) or Amazon Data Lifecycle Manager policies. - Tag volumes and apply policies per tag (
Backup=Daily,Backup=Hourly).
- Use
- RDS
- Enable automated backups and create manual DB snapshots before risky changes.
- AWS Backup
- Create a backup plan with schedules and retention.
- Attach resources via tags or explicit ARNs.
Benefits
- Point‑in‑time restore with minimal downtime.
- High durability (S3 under the hood).
- Minimal code: mostly configuration.
Trade‑offs
- Snapshots are infrastructure-level: they don’t know about app‑level consistency unless you coordinate (e.g. quiesce writes).
- Cross‑region copies cost extra but are critical for DR.
1.2 Scale Up / Scale Out Pattern
Problem
Your load changes over time and you want to handle peaks without over‑provisioning.
Architecture
Clients
|
[ALB / NLB]
|
[Auto Scaling Group (EC2) or ECS/EKS service]
^
|
Scaling policies (CPU, QPS, custom metrics)
Implementation
- Use Auto Scaling Groups for EC2 or ECS/EKS Service Auto Scaling for containers.
- Attach policies:
- Target utilisation (e.g. keep average CPU at 50%).
- Request count per target.
- Custom CloudWatch metrics (SQS queue depth, latency).
- Configure min / max / desired capacity and cooldowns.
Benefits
- Capacity tracks demand; you pay closer to what you use.
- Works well with stateless applications and multi‑AZ deployments.
Trade‑offs
- Instances need time to warm up; scale‑out is not instant.
- Stateful workloads need extra work (sharding, sticky sessions, caches).
2. High‑availability patterns
2.1 Multi‑AZ / Multi‑Datacenter Pattern
Problem
You want your application to remain available even if an AZ fails.
Architecture
Region
+-----------------------------+
| VPC |
| +---------+ +---------+ |
| | AZ A | | AZ B | |
| | EC2 | | EC2 | |
| | RDS-A |<-|-> RDS-B | |
| +---------+ +---------+ |
+--------------|--------------+
|
[ALB]
- Compute (EC2/ECS/EKS) spread across multiple subnets in different AZs.
- Multi‑AZ RDS or Aurora cluster for the database.
- ALB/NLB distributes traffic across AZs.
Implementation
- For web backends:
- Auto Scaling Group with subnets from at least two AZs.
- ALB with targets in each AZ.
- For data:
- Enable Multi‑AZ for RDS instances or use Aurora with multiple instances.
- Ensure S3 is used for durable assets (already multi‑AZ).
Benefits
- Survives AZ‑level failures with minimal/no downtime.
- Separates infrastructure faults from application faults.
Trade‑offs
- Slightly higher cost (extra instances in another AZ).
- You still need backups and cross‑region DR for region‑wide outages.
2.2 Multi‑Region Read Pattern (for latency & DR)
Problem
Users are globally distributed. A single region causes high latency and is a single point of failure.
Architecture (read‑mostly workloads)
Region A (primary) Region B (secondary)
------------------- --------------------
Writes + Reads Read replicas only
[RDS/Aurora writer] ---> [Aurora global / read replica]
| |
[ALB] [ALB]
^ ^
Users (A) Users (B)
Implementation
- For relational data:
- Use Aurora Global Database or cross‑region RDS read replicas.
- For static/content data:
- Rely on S3 + CloudFront (global edge network).
- DNS:
- Use Route 53 latency‑based routing or failover routing between regions.
Benefits
- Lower latency for regional users.
- Foundation for disaster recovery (promote secondary region).
Trade‑offs
- Increased complexity, especially for writes across regions (eventual consistency, conflict resolution).
- More infrastructure cost.
3. Data & storage patterns
3.1 Web Storage / Static Hosting Pattern
Problem
You want simple, highly available hosting for static sites or assets.
Architecture
Users
|
[CloudFront CDN] <----> [S3 bucket: static content]
Implementation
- Put all static content (HTML, CSS, JS, images) in S3.
- Configure CloudFront:
- Origin = S3 bucket (or S3 static website endpoint).
- Behaviours with long TTL for immutable assets.
- Use Route 53 + ACM to serve
https://yourdomain.comwith a custom cert.
Benefits
- Zero servers to manage.
- Global performance with CloudFront edge locations.
- Very low cost for many workloads.
Trade‑offs
- Only for static content. Dynamic features must call APIs (Lambda, API Gateway, etc.).
- Cache invalidation policies must be planned.
3.2 Read Replica Pattern (Relational DB)
Problem
You have read‑heavy workloads causing load on the primary database.
Architecture
Writes Reads
App ----------------> [Primary DB]
/ | \
v v v
[Read replicas]
Implementation
- In RDS, create one or more read replicas.
- In Aurora, use the reader endpoint or named endpoints per replica.
- Route:
- OLTP reads that need fresh data → primary.
- Reports, dashboards, background jobs → replicas.
Benefits
- Offloads heavy read queries from the primary.
- Horizontal read scalability.
Trade‑offs
- Replication lag → replicas are eventually consistent.
- Need clear guidance in code: which queries can tolerate stale data.
4. Batch & integration patterns
4.1 Queuing Chain Pattern (Loose Coupling with SQS)
Problem
Producers and consumers should not depend on each other’s availability or throughput.
Architecture
Producers Queue Workers (auto-scaled)
| | |
v v v
[App / ETL] ---> [SQS] ---> [Lambda / ECS / Batch]
Implementation
- SQS:
- Standard queue for at‑least‑once delivery.
- FIFO if order and exactly‑once are crucial (with caveats).
- Consumers:
- Lambda with SQS trigger, or
- Containerised worker service (ECS/EKS) polling SQS.
- Scaling:
- Auto scale workers based on queue metrics (e.g.
ApproximateNumberOfMessagesVisible, age of oldest message).
- Auto scale workers based on queue metrics (e.g.
Benefits
- Decouples producers from consumers; spikes are buffered.
- Simple to scale consumers horizontally.
- Natural fit for event‑driven architectures.
Trade‑offs
- At‑least‑once delivery → consumers must be idempotent.
- Monitoring and DLQs are needed to catch poison messages.
4.2 Priority Queue Pattern
Problem
Not all tasks are equal — some must be processed before others.
Architecture
High-priority producers ---> [SQS High]
Normal producers ---> [SQS Normal]
/ \
[Workers for High] [Workers for Normal]
Implementation
- Create separate queues (e.g.
queue-high,queue-normal). - Have:
- Workers dedicated to the high queue.
- Workers that can pull from both but prefer high priority when available.
- Use CloudWatch metrics and autoscaling per queue.
Benefits
- Predictable latency for critical tasks.
- Isolation between workloads at the queue level.
Trade‑offs
- Slightly more operational overhead (more queues, policies, metrics).
- Requires careful worker logic when consuming from multiple queues.
5. Operations & maintenance patterns
5.1 Bootstrap Pattern (Automatic Instance Initialisation)
Problem
You want new servers to configure themselves automatically when they start, rather than baking all config into AMIs.
Architecture
Launch Template / ASG
|
v
EC2 instance
|
+--> user-data script
|
+--> S3 / Parameter Store / SSM (download config, register, install agents)
Implementation
- Use user‑data scripts in launch templates:
#!/bin/bash
aws ssm get-parameter --name /app/config --with-decryption --query 'Parameter.Value' --output text > /etc/app/config.yml
systemctl start app
- Store configuration in:
- Systems Manager Parameter Store or Secrets Manager.
- S3 (for larger files or templates).
- Optionally, use SSM Agent to push further config after boot.
Benefits
- New instances/self‑healing nodes are ready without manual steps.
- Decouples image baking from configuration.
- Works well with Auto Scaling.
Trade‑offs
- Boot time increases slightly due to setup work.
- You must secure configuration sources (IAM roles, encryption).
5.2 Stack Deployment Pattern (Infrastructure as Code)
Problem
You want repeatable environments (dev/test/prod) and controlled infrastructure changes.
Architecture
Templates (CloudFormation / CDK / Terraform)
|
v
Stacks (VPC, ALB, ASG, RDS, SQS, etc.)
Implementation
- Define infrastructure as code:
- CloudFormation templates.
- AWS CDK (code‑driven).
- Or Terraform.
- Group related resources into stacks:
- Network stack (VPC, subnets, gateways).
- Application stack (ALB, ASG, ECS services).
- Data stack (RDS, DynamoDB, S3 buckets).
Benefits
- Version‑controlled infrastructure; easy rollback via stack changes.
- Consistent environments across regions and accounts.
- Easier onboarding: review templates instead of tribal knowledge.
Trade‑offs
- Learning curve for IaC tools.
- Large monolithic templates can become hard to maintain (use nested stacks/modules).
6. Network & security patterns (brief)
6.1 Three‑tier VPC Pattern
Problem
Separate public access, application logic, and data for security and clarity.
Architecture
Internet
|
[ALB] (public subnets)
|
[App instances] (private subnets)
|
[RDS/Aurora] (private subnets, no internet)
Implementation
- VPC with:
- Public subnets (ALB, NAT gateways).
- Private app subnets (EC2/ECS/EKS).
- Private DB subnets (RDS/Aurora only).
- Use security groups to allow:
- ALB → App.
- App → DB.
- No direct internet to DB.
Benefits
- Clear separation of responsibilities and blast radius.
- Easier to reason about security.
Trade‑offs
- Slightly more configuration (subnets, route tables, SGs).
- NAT adds cost for outbound internet from private subnets.
7. Putting patterns together
In real systems, you rarely use just one pattern. For example, a production web application might combine:
- Three‑tier VPC + Multi‑AZ for network and availability.
- Scale Out pattern with Auto Scaling Groups or ECS services.
- Snapshot + Read Replica for data durability and read scaling.
- Queuing Chain for background workloads.
- Bootstrap + Stack Deployment for safe, repeatable operations.
Thinking in terms of patterns helps you:
- Communicate architecture clearly with other engineers.
- Reuse proven solutions instead of improvising from scratch.
- Evaluate trade‑offs (cost, complexity, performance, availability) more systematically.
As your systems grow, you can refine your own internal pattern catalogue, tuned to how your organisation uses AWS.
Comments