quyennv.com

Senior DevOps Engineer · Healthcare, Fanance

Detecting…

AWS Cloud Design Patterns: Architecture and Practical Examples

#aws#cloud-architecture#design-patterns#system-design#devops

0

AWS Cloud Design Patterns are reusable solutions for common problems when you build systems on AWS: high availability, scaling, storage, batch processing, and operations. Instead of starting from a blank page, you combine patterns like building blocks.

The original catalogue of patterns is maintained at AWS Cloud Design Pattern. This post:

  • Lists the full pattern catalogue (grouped as on the reference site) with links to each pattern page.
  • Provides original, high-level descriptions and reference architectures for several of the most practically useful patterns.

When studying or designing systems, you can use this as a quick companion to the official pattern pages.


This section mirrors the structure of the AWS Cloud Design Pattern site and links to each individual pattern for deeper reading.

0.1 Basic Patterns

0.2 Patterns for High Availability

0.3 Patterns for Processing Dynamic Content

0.4 Patterns for Processing Static Content

0.5 Patterns for Uploading Data

0.6 Patterns for Relational Database

0.7 Patterns for Batch Processing

0.8 Pattern for Operation and Maintenance

0.9 Patterns for Network


This post then focuses on a subset of these patterns and gives original summaries and reference architectures for them.


1. Basic patterns

1.1 Snapshot Pattern (Data Backups)

Problem
You need consistent, low‑impact backups of data (volumes, databases) with fast restore options.

Architecture

Application -> EBS volumes / RDS instance
                      |
                      v
            Snapshots (stored in S3, managed by AWS)
                      |
         Backup plans / lifecycle policies (retention, copy)
  • Primary data lives on EBS volumes, RDS, or EFS.
  • Snapshots are stored in S3 (managed by AWS) and can be copied across regions/accounts.

Implementation

  • EBS volumes
    • Use CreateSnapshot (CLI/SDK) or Amazon Data Lifecycle Manager policies.
    • Tag volumes and apply policies per tag (Backup=Daily, Backup=Hourly).
  • RDS
    • Enable automated backups and create manual DB snapshots before risky changes.
  • AWS Backup
    • Create a backup plan with schedules and retention.
    • Attach resources via tags or explicit ARNs.

Benefits

  • Point‑in‑time restore with minimal downtime.
  • High durability (S3 under the hood).
  • Minimal code: mostly configuration.

Trade‑offs

  • Snapshots are infrastructure-level: they don’t know about app‑level consistency unless you coordinate (e.g. quiesce writes).
  • Cross‑region copies cost extra but are critical for DR.

1.2 Scale Up / Scale Out Pattern

Problem
Your load changes over time and you want to handle peaks without over‑provisioning.

Architecture

Clients
  |
 [ALB / NLB]
  |
 [Auto Scaling Group (EC2) or ECS/EKS service]
        ^
        |
  Scaling policies (CPU, QPS, custom metrics)

Implementation

  • Use Auto Scaling Groups for EC2 or ECS/EKS Service Auto Scaling for containers.
  • Attach policies:
    • Target utilisation (e.g. keep average CPU at 50%).
    • Request count per target.
    • Custom CloudWatch metrics (SQS queue depth, latency).
  • Configure min / max / desired capacity and cooldowns.

Benefits

  • Capacity tracks demand; you pay closer to what you use.
  • Works well with stateless applications and multi‑AZ deployments.

Trade‑offs

  • Instances need time to warm up; scale‑out is not instant.
  • Stateful workloads need extra work (sharding, sticky sessions, caches).

2. High‑availability patterns

2.1 Multi‑AZ / Multi‑Datacenter Pattern

Problem
You want your application to remain available even if an AZ fails.

Architecture

           Region
+-----------------------------+
|          VPC                |
|  +---------+  +---------+   |
|  |  AZ A   |  |  AZ B   |   |
|  |  EC2    |  |  EC2    |   |
|  |  RDS-A  |<-|-> RDS-B |   |
|  +---------+  +---------+   |
+--------------|--------------+
               |
             [ALB]
  • Compute (EC2/ECS/EKS) spread across multiple subnets in different AZs.
  • Multi‑AZ RDS or Aurora cluster for the database.
  • ALB/NLB distributes traffic across AZs.

Implementation

  • For web backends:
    • Auto Scaling Group with subnets from at least two AZs.
    • ALB with targets in each AZ.
  • For data:
    • Enable Multi‑AZ for RDS instances or use Aurora with multiple instances.
    • Ensure S3 is used for durable assets (already multi‑AZ).

Benefits

  • Survives AZ‑level failures with minimal/no downtime.
  • Separates infrastructure faults from application faults.

Trade‑offs

  • Slightly higher cost (extra instances in another AZ).
  • You still need backups and cross‑region DR for region‑wide outages.

2.2 Multi‑Region Read Pattern (for latency & DR)

Problem
Users are globally distributed. A single region causes high latency and is a single point of failure.

Architecture (read‑mostly workloads)

Region A (primary)            Region B (secondary)
-------------------           --------------------
  Writes + Reads               Read replicas only
  [RDS/Aurora writer] ---> [Aurora global / read replica]
          |                              |
        [ALB]                          [ALB]
          ^                              ^
       Users (A)                     Users (B)

Implementation

  • For relational data:
    • Use Aurora Global Database or cross‑region RDS read replicas.
  • For static/content data:
    • Rely on S3 + CloudFront (global edge network).
  • DNS:
    • Use Route 53 latency‑based routing or failover routing between regions.

Benefits

  • Lower latency for regional users.
  • Foundation for disaster recovery (promote secondary region).

Trade‑offs

  • Increased complexity, especially for writes across regions (eventual consistency, conflict resolution).
  • More infrastructure cost.

3. Data & storage patterns

3.1 Web Storage / Static Hosting Pattern

Problem
You want simple, highly available hosting for static sites or assets.

Architecture

Users
  |
 [CloudFront CDN]  <---->  [S3 bucket: static content]

Implementation

  • Put all static content (HTML, CSS, JS, images) in S3.
  • Configure CloudFront:
    • Origin = S3 bucket (or S3 static website endpoint).
    • Behaviours with long TTL for immutable assets.
  • Use Route 53 + ACM to serve https://yourdomain.com with a custom cert.

Benefits

  • Zero servers to manage.
  • Global performance with CloudFront edge locations.
  • Very low cost for many workloads.

Trade‑offs

  • Only for static content. Dynamic features must call APIs (Lambda, API Gateway, etc.).
  • Cache invalidation policies must be planned.

3.2 Read Replica Pattern (Relational DB)

Problem
You have read‑heavy workloads causing load on the primary database.

Architecture

           Writes        Reads
App  ----------------> [Primary DB]
                       /   |   \
                      v    v    v
                  [Read replicas]

Implementation

  • In RDS, create one or more read replicas.
  • In Aurora, use the reader endpoint or named endpoints per replica.
  • Route:
    • OLTP reads that need fresh data → primary.
    • Reports, dashboards, background jobs → replicas.

Benefits

  • Offloads heavy read queries from the primary.
  • Horizontal read scalability.

Trade‑offs

  • Replication lag → replicas are eventually consistent.
  • Need clear guidance in code: which queries can tolerate stale data.

4. Batch & integration patterns

4.1 Queuing Chain Pattern (Loose Coupling with SQS)

Problem
Producers and consumers should not depend on each other’s availability or throughput.

Architecture

Producers          Queue             Workers (auto-scaled)
   |                |                        |
   v                v                        v
[App / ETL] ---> [SQS]  --->  [Lambda / ECS / Batch]

Implementation

  • SQS:
    • Standard queue for at‑least‑once delivery.
    • FIFO if order and exactly‑once are crucial (with caveats).
  • Consumers:
    • Lambda with SQS trigger, or
    • Containerised worker service (ECS/EKS) polling SQS.
  • Scaling:
    • Auto scale workers based on queue metrics (e.g. ApproximateNumberOfMessagesVisible, age of oldest message).

Benefits

  • Decouples producers from consumers; spikes are buffered.
  • Simple to scale consumers horizontally.
  • Natural fit for event‑driven architectures.

Trade‑offs

  • At‑least‑once delivery → consumers must be idempotent.
  • Monitoring and DLQs are needed to catch poison messages.

4.2 Priority Queue Pattern

Problem
Not all tasks are equal — some must be processed before others.

Architecture

High-priority producers  ---> [SQS High]
Normal producers         ---> [SQS Normal]
                             /          \
                    [Workers for High]  [Workers for Normal]

Implementation

  • Create separate queues (e.g. queue-high, queue-normal).
  • Have:
    • Workers dedicated to the high queue.
    • Workers that can pull from both but prefer high priority when available.
  • Use CloudWatch metrics and autoscaling per queue.

Benefits

  • Predictable latency for critical tasks.
  • Isolation between workloads at the queue level.

Trade‑offs

  • Slightly more operational overhead (more queues, policies, metrics).
  • Requires careful worker logic when consuming from multiple queues.

5. Operations & maintenance patterns

5.1 Bootstrap Pattern (Automatic Instance Initialisation)

Problem
You want new servers to configure themselves automatically when they start, rather than baking all config into AMIs.

Architecture

Launch Template / ASG
   |
   v
EC2 instance
   |
   +--> user-data script
          |
          +--> S3 / Parameter Store / SSM (download config, register, install agents)

Implementation

  • Use user‑data scripts in launch templates:
#!/bin/bash
aws ssm get-parameter --name /app/config --with-decryption --query 'Parameter.Value' --output text > /etc/app/config.yml
systemctl start app
  • Store configuration in:
    • Systems Manager Parameter Store or Secrets Manager.
    • S3 (for larger files or templates).
  • Optionally, use SSM Agent to push further config after boot.

Benefits

  • New instances/self‑healing nodes are ready without manual steps.
  • Decouples image baking from configuration.
  • Works well with Auto Scaling.

Trade‑offs

  • Boot time increases slightly due to setup work.
  • You must secure configuration sources (IAM roles, encryption).

5.2 Stack Deployment Pattern (Infrastructure as Code)

Problem
You want repeatable environments (dev/test/prod) and controlled infrastructure changes.

Architecture

Templates (CloudFormation / CDK / Terraform)
        |
        v
   Stacks (VPC, ALB, ASG, RDS, SQS, etc.)

Implementation

  • Define infrastructure as code:
    • CloudFormation templates.
    • AWS CDK (code‑driven).
    • Or Terraform.
  • Group related resources into stacks:
    • Network stack (VPC, subnets, gateways).
    • Application stack (ALB, ASG, ECS services).
    • Data stack (RDS, DynamoDB, S3 buckets).

Benefits

  • Version‑controlled infrastructure; easy rollback via stack changes.
  • Consistent environments across regions and accounts.
  • Easier onboarding: review templates instead of tribal knowledge.

Trade‑offs

  • Learning curve for IaC tools.
  • Large monolithic templates can become hard to maintain (use nested stacks/modules).

6. Network & security patterns (brief)

6.1 Three‑tier VPC Pattern

Problem
Separate public access, application logic, and data for security and clarity.

Architecture

Internet
   |
[ALB]  (public subnets)
   |
[App instances] (private subnets)
   |
[RDS/Aurora]    (private subnets, no internet)

Implementation

  • VPC with:
    • Public subnets (ALB, NAT gateways).
    • Private app subnets (EC2/ECS/EKS).
    • Private DB subnets (RDS/Aurora only).
  • Use security groups to allow:
    • ALB → App.
    • App → DB.
    • No direct internet to DB.

Benefits

  • Clear separation of responsibilities and blast radius.
  • Easier to reason about security.

Trade‑offs

  • Slightly more configuration (subnets, route tables, SGs).
  • NAT adds cost for outbound internet from private subnets.

7. Putting patterns together

In real systems, you rarely use just one pattern. For example, a production web application might combine:

  • Three‑tier VPC + Multi‑AZ for network and availability.
  • Scale Out pattern with Auto Scaling Groups or ECS services.
  • Snapshot + Read Replica for data durability and read scaling.
  • Queuing Chain for background workloads.
  • Bootstrap + Stack Deployment for safe, repeatable operations.

Thinking in terms of patterns helps you:

  • Communicate architecture clearly with other engineers.
  • Reuse proven solutions instead of improvising from scratch.
  • Evaluate trade‑offs (cost, complexity, performance, availability) more systematically.

As your systems grow, you can refine your own internal pattern catalogue, tuned to how your organisation uses AWS.

← All posts

Comments