quyennv.com

Senior DevOps Engineer · Healthcare, Singapore

Terraform: Architecture, How It Works, and Best Practices

#terraform#iac#infrastructure-as-code#devops#cloud

Terraform is an open-source Infrastructure as Code (IaC) tool by HashiCorp. You describe infrastructure in HCL (HashiCorp Configuration Language), and Terraform creates or updates resources in cloud providers (AWS, Azure, GCP) or other APIs (Kubernetes, DNS, etc.) so the real world matches your configuration.

Why Terraform?

  • Declarative: You define the desired state; Terraform figures out create/update/delete.
  • Multi-cloud and multi-service: One tool and one language for clouds, Kubernetes, databases, and more via providers.
  • State: Terraform keeps a state file of what it manages so it can drift detection and safe updates.
  • Plan before apply: terraform plan shows the exact changes; you apply only after review.

Terraform architecture

Terraform has three main pieces: the core, providers, and state.

+--------------------------- TERRAFORM CORE (CLI) ---------------------------+
|  +-------------+  +-------------+  +-------------+  +-----------------+    |
|  |   Config    |  |  Graph      |  |  State      |  |  Provider       |    |
|  |   Loader    |->|  Builder   |->|  Manager    |<-|  Plugins         |    |
|  | (.tf files) |  | (dependency)|  | (read/write)|  | (AWS, Azure...) |    |
|  +-------------+  +-------------+  +------+------+  +--------+--------+    |
|         |                 |               |                   |            |
|         v                 v               v                   v            |
|  +---------------------------------------------------------------------+   |
|  |  Execution: plan (state+config -> diff) or apply (call providers)   |   |
|  +---------------------------------------------------------------------+   |
+----------------------------------------------------------------------------+
        |                        |                              |
        v                        v                              v
   Your .tf files          State (local file              Cloud / API
   (desired state)         or remote backend)            (actual resources)

Core components

ComponentRole
Config loaderReads .tf and .tf.json files, resolves variables and modules, and builds an internal representation of the desired infrastructure.
Graph builderBuilds a dependency graph of resources (e.g. subnet before VM). Plan and apply execute in an order that respects dependencies.
State managerReads and writes state: a mapping of your config (resource addresses) to real resource IDs and attributes. Used for drift detection and update planning.
Execution engineFor plan: compares desired (config) vs current (state), and asks providers for refresh; produces a change set. For apply: invokes provider APIs to create/update/delete resources, then updates state.

Providers

  • Providers are plugins that talk to a specific API (e.g. AWS, Azure, Kubernetes). Each resource type (e.g. aws_instance, azurerm_resource_group) belongs to a provider.
  • Terraform downloads providers at terraform init and uses them during plan and apply.
  • You declare which providers and versions you need; the core stays small and generic; all API logic lives in providers.

State

  • State holds: resource address → provider’s resource ID and stored attributes. Terraform uses it to know what it already created and what to change or destroy.
  • Backend is where state lives: local (a file on disk) or remote (e.g. S3 + DynamoDB, Azure Storage, Terraform Cloud). Remote backends enable team use, locking, and security (no state on laptops).

How Terraform works

1. Write configuration (.tf)

You define resources, data sources, variables, and outputs in HCL:

# main.tf
terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
}

provider "aws" {
  region = var.aws_region
}

resource "aws_s3_bucket" "app" {
  bucket = "${var.project_name}-app-${var.environment}"

  tags = {
    Environment = var.environment
    ManagedBy   = "terraform"
  }
}

2. terraform init

  • Downloads the provider plugins (and module sources if you use modules).
  • Initializes the backend (e.g. configures remote state).
  • Run once per working directory (and after adding providers or changing backend).
terraform init

3. terraform plan

  • Refreshes state: Asks providers to update state with the current reality (optional but default).
  • Plans: Compares config (desired) with state (current) and computes create, update, or destroy actions.
  • Output: Human-readable plan; no changes are made. Use this to review before apply.
terraform plan -out=tfplan

4. terraform apply

  • Runs the planned changes: calls provider APIs to create/update/delete resources.
  • Updates state after each successful change.
  • With -auto-approve it skips the confirmation prompt (useful in CI); otherwise Terraform asks for confirmation.
terraform apply tfplan
# or
terraform apply

5. terraform destroy

  • Plans and applies the destruction of all resources in state (reverse order of dependencies). Use with care; often you want to destroy only a subset (target or separate state).

End-to-end flow

  1. You edit .tf (desired state).
  2. terraform init ensures providers and backend are ready.
  3. terraform plan refreshes state, diffs desired vs current, outputs the change set.
  4. terraform apply executes the change set via providers and writes new state.
  5. State is the source of truth for “what Terraform manages”; real infrastructure is the source of truth for “what actually exists.” Terraform reconciles the two.

Best practices

1. State management

  • Use a remote backend in production (e.g. S3 + DynamoDB for locking, Azure Storage, GCS, or Terraform Cloud). Avoid long-lived local state.
  • Enable state locking so two runs never apply at the same time (DynamoDB for S3 backend, etc.).
  • Never commit state with secrets or sensitive data to Git. Prefer remote state and restrict access with IAM.
  • Separate state per environment or layer (e.g. one state for networking, one for compute) to limit blast radius and allow different lifecycles.
# backend.tf (example: S3 + DynamoDB for AWS)
terraform {
  backend "s3" {
    bucket         = "my-company-terraform-state"
    key            = "prod/app/terraform.tfstate"
    region         = "us-east-1"
    dynamodb_table = "terraform-locks"
    encrypt        = true
  }
}

2. Use modules

  • Modules are reusable bundles of resources (e.g. “VPC module,” “ECS service module”). Use them to avoid copy-paste and to standardize patterns.
  • Compose environments from the same module with different variables (e.g. dev vs prod).
  • Prefer HashiCorp and community modules where they fit (e.g. terraform-aws-modules/vpc/aws) and wrap or extend them if needed.
module "vpc" {
  source  = "terraform-aws-modules/vpc/aws"
  version = "5.0.0"

  name = "${var.project_name}-vpc"
  cidr = var.vpc_cidr

  azs             = var.azs
  private_subnets = var.private_subnet_cidrs
  public_subnets  = var.public_subnet_cidrs

  enable_nat_gateway = true
  single_nat_gateway = var.environment != "prod"
}

3. Pin provider and Terraform versions

  • In the terraform block, pin required_providers (and versions) so everyone and CI use the same provider behavior.
  • Pin Terraform version if you use CI or multiple people (e.g. required_version = ">= 1.5.0").
terraform {
  required_version = ">= 1.5.0"

  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
}

4. Variables and outputs

  • Use variables for environment-specific or sensitive values (region, env name, instance type). Give defaults where it makes sense; use variable validation to fail fast on bad input.
  • Use outputs to expose IDs, ARNs, and endpoints to other Terraform configs (via remote state) or to CI/CD.
  • Sensitive variables: Mark as sensitive = true so they are redacted in plan/apply log and in state; store values in env vars or a secret manager, not in .tf files.

5. Naming and structure

  • Resource names: Use a consistent scheme (e.g. {project}-{env}-{resource}) and tags (Environment, Project, ManagedBy) so you can identify and govern resources.
  • File layout: Split by concern (e.g. main.tf, variables.tf, outputs.tf, versions.tf, backend.tf) or by layer (e.g. network.tf, compute.tf). Keep modules in a modules/ directory.

6. Security (overview)

  • No secrets in code: Use TF_VAR_* or a secret store; mark variables as sensitive.
  • Least privilege: Run Terraform (and CI) with IAM/roles that have only the permissions needed for the resources you manage.
  • Private state: Store state in a bucket/container with encryption and access controls; use a private backend or VPC endpoints where possible.
  • Review plans: In CI, run terraform plan and require approval before apply; consider policy as code (e.g. Sentinel, OPA) for guardrails.

See Advanced: Security and hardening and Secure Terraform code examples below for details and ready-to-use patterns.

7. Workspaces vs separate state

  • Workspaces (e.g. default, dev, prod) use one backend with different state keys. They are simple but can be confusing; naming and discipline matter.
  • Separate directories or repos per environment with their own state (and backend config) give clear separation and are often easier to reason about for prod vs non-prod.

8. Plan and apply in CI

  • Run terraform fmt -check and terraform validate in CI.
  • Run terraform plan on every change and store the plan artifact; apply only after review (or from a protected branch). Use target or -destroy sparingly and with explicit approval.

Advanced: Security and hardening

Secrets management

PracticeDescription
Never commit secretsNo passwords, API keys, or tokens in .tf, .tfvars, or state. Use environment variables (TF_VAR_*), a secret manager (HashiCorp Vault, AWS Secrets Manager), or CI secrets.
Mark sensitive variablesSet sensitive = true on variables and outputs so Terraform redacts them in logs and in terraform plan output.
Sensitive in stateState can contain sensitive values (e.g. DB password in a resource attribute). Always use a remote backend with encryption and strict access control; never commit state.
Provider credentialsPrefer IAM roles (e.g. EC2 instance profile, OIDC in CI) over long-lived access keys. If you use keys, inject via env vars, not files in the repo.

State security

  • Remote backend only in production: Use S3, Azure Storage, GCS, or Terraform Cloud with encryption at rest.
  • State locking: Prevent concurrent apply (e.g. DynamoDB for S3 backend). Reduces risk of state corruption and conflicting changes.
  • Access control: Restrict who can read/write state (IAM, RBAC). Prefer separate state per environment and least-privilege roles for CI.
  • Encryption: Enable server-side encryption on the state bucket/container; use KMS where available for audit and key control.
  • Private access: Use VPC endpoints or private connectivity to the state backend so state does not traverse the public internet.

Least privilege for Terraform execution

  • Scoped IAM: The identity that runs terraform apply (user or CI role) should have only the permissions required to create/update/delete the resources in your config. Avoid broad policies like *:* on a whole service.
  • Policy documents in Terraform: Define IAM policies and roles in Terraform and attach them to the runner; use conditions (e.g. aws:RequestedRegion, aws:SourceAccount) to tighten scope.
  • Separate roles per stack: Use different roles for dev vs prod so a compromise or mistake in dev does not affect production.

Supply chain and integrity

  • Provider version pinning: Pin required_providers with a version constraint (e.g. ~> 5.0) and run terraform init -upgrade in a controlled way. Use .terraform.lock.hcl and commit it so everyone uses the same provider binaries.
  • Module sources: Prefer modules from trusted registries (HashiCorp, official); pin version or commit. For private modules, use a private registry or tagged Git refs.
  • Verification: Terraform can verify provider checksums (in the lock file). In locked-down environments, use a private provider mirror and verify hashes.

Policy as code and scanning

  • Pre-apply checks: Use tools like tfsec, Checkov, or Trivy to scan .tf for misconfigurations (e.g. public S3 buckets, open security groups, unencrypted storage).
  • Terraform Cloud / Enterprise: Sentinel policies can enforce rules (e.g. “no instance without a tag”, “only approved instance types”) before an apply.
  • OPA (Open Policy Agent): Integrate with CI to evaluate Terraform plan or state against custom policies (e.g. “no new public IPs in prod”).

Secure resource defaults

  • Encryption: Enable encryption at rest (S3, EBS, RDS, etc.) and in transit (TLS) by default in your modules and examples.
  • Networking: Prefer private subnets and security groups that allow only necessary ingress/egress; avoid 0.0.0.0/0 unless required and documented.
  • Logging and auditing: Enable CloudTrail, flow logs, and resource-level logging where relevant so you can audit changes and investigate incidents.

Secure Terraform code examples

The following snippets show patterns to enhance security in your Terraform source.

1. Sensitive variables and validation

# variables.tf
variable "db_password" {
  description = "Database master password"
  type        = string
  sensitive   = true

  validation {
    condition     = length(var.db_password) >= 16
    error_message = "Password must be at least 16 characters."
  }
}

variable "environment" {
  type        = string
  description = "Environment name (dev, staging, prod)"

  validation {
    condition     = contains(["dev", "staging", "prod"], var.environment)
    error_message = "Environment must be dev, staging, or prod."
  }
}
  • Use sensitive = true so the value is never printed in plan/apply or in logs.
  • Use validation blocks to fail fast on invalid or dangerous values (e.g. prod safeguards).

2. Remote backend with encryption and locking

# backend.tf
terraform {
  backend "s3" {
    bucket         = "my-org-terraform-state"
    key            = "prod/network/terraform.tfstate"
    region         = "ap-southeast-1"
    dynamodb_table = "terraform-state-lock"
    encrypt        = true
    kms_key_id     = "arn:aws:kms:ap-southeast-1:123456789012:key/..."
  }
}
  • encrypt = true enables server-side encryption; kms_key_id uses a customer-managed key for audit and control.
  • dynamodb_table enables state locking so concurrent applies are blocked.

3. No hardcoded credentials; use env or data source

# Bad: never do this
# provider "aws" {
#   access_key = "AKIA..."
#   secret_key = "..."
# }

# Good: credentials from environment (or IAM role if on EC2/ECS/Lambda)
provider "aws" {
  region = var.aws_region
  # access_key and secret_key omitted: use AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, or IAM role
}

# Good: fetch existing secret from a secret manager instead of passing raw values
data "aws_secretsmanager_secret_version" "db" {
  secret_id = var.db_secret_arn
}

locals {
  db_credentials = jsondecode(data.aws_secretsmanager_secret_version.db.secret_string)
}
  • Provider credentials from env or instance/profile; never in .tf or .tfvars committed to Git.
  • Use data sources (e.g. Secrets Manager, SSM Parameter Store) to pull secrets at apply time instead of TF_VAR_ when the secret already lives in a secure store.

4. Secure S3 bucket (encryption, versioning, block public access)

resource "aws_s3_bucket" "app_data" {
  bucket = "${var.project_name}-${var.environment}-data"

  tags = {
    Environment = var.environment
    ManagedBy   = "terraform"
  }
}

resource "aws_s3_bucket_versioning" "app_data" {
  bucket = aws_s3_bucket.app_data.id

  versioning_configuration {
    status = "Enabled"
  }
}

resource "aws_kms_key" "s3" {
  description             = "KMS key for ${var.project_name} S3 bucket encryption"
  deletion_window_in_days = 10
  enable_key_rotation     = true
}

resource "aws_s3_bucket_server_side_encryption_configuration" "app_data" {
  bucket = aws_s3_bucket.app_data.id

  rule {
    apply_server_side_encryption_by_default {
      sse_algorithm     = "aws:kms"
      kms_master_key_id = aws_kms_key.s3.arn
    }
    bucket_key_enabled = true
  }
}

resource "aws_s3_bucket_public_access_block" "app_data" {
  bucket = aws_s3_bucket.app_data.id

  block_public_acls       = true
  block_public_policy     = true
  ignore_public_acls      = true
  restrict_public_buckets = true
}
  • Versioning for recovery; KMS encryption for at-rest security; public_access_block to prevent accidental public exposure.

5. Restrictive security group (least privilege)

resource "aws_security_group" "app" {
  name_prefix = "${var.project_name}-app-"
  vpc_id      = module.vpc.vpc_id
  description = "Application tier; allow only from ALB and required egress"

  ingress {
    description     = "HTTPS from ALB"
    from_port       = 443
    to_port         = 443
    protocol        = "tcp"
    security_groups = [aws_security_group.alb.id]
  }

  egress {
    description = "HTTPS to internet (e.g. APIs)"
    from_port   = 443
    to_port     = 443
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }

  lifecycle {
    create_before_destroy = true
  }

  tags = {
    Name = "${var.project_name}-app"
  }
}
  • Ingress only from the ALB security group (no 0.0.0.0/0); egress limited to what the app needs (here HTTPS). Tighten egress further (e.g. VPC endpoints, specific prefixes) where possible.

6. Outputs that must stay secret

output "db_endpoint" {
  value       = aws_db_instance.main.endpoint
  description = "Database endpoint"
}

output "db_password" {
  value       = aws_db_instance.main.password
  sensitive   = true
  description = "Database password (redacted in logs)"
}
  • Mark any output that could contain secrets as sensitive = true so it is never printed in logs or in the plan.

7. Version and lock file (reproducible, verifiable runs)

# versions.tf
terraform {
  required_version = ">= 1.5.0, < 2.0.0"

  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
}
  • Pin required_version and required_providers; run terraform init and commit .terraform.lock.hcl so all environments and CI use the same provider versions and checksums.

Summary

TopicTakeaway
ArchitectureCore (config, graph, state, execution) + providers (plugins) + state (mapping from config to real IDs).
Flowinit → plan (diff desired vs state) → apply (call providers, update state). State is the link between config and real resources.
Best practicesRemote state with locking; modules; version pinning; variables/outputs and sensitive handling; consistent naming and tags; no secrets in code; review plans and least-privilege IAM.
SecurityNo secrets in code; sensitive variables and outputs; encrypted remote state and locking; least-privilege IAM for Terraform; supply chain (pinned providers, lock file); policy/scanning (tfsec, Sentinel, OPA); secure defaults (encryption, restrictive security groups).
Secure codeUse validation and sensitive on variables; backend with encryption and DynamoDB lock; credentials from env or IAM; data sources for secrets; S3 encryption and public access block; restrictive security groups; sensitive outputs; pin versions and commit lock file.

Terraform lets you manage cloud and other APIs declaratively. Combining its architecture and workflow with security best practices and secure Terraform code patterns (secrets handling, state protection, least privilege, and scanning) keeps infrastructure safe and maintainable as you scale.

← All posts

Comments