Terraform: Architecture, How It Works, and Best Practices
#terraform#iac#infrastructure-as-code#devops#cloud
Terraform is an open-source Infrastructure as Code (IaC) tool by HashiCorp. You describe infrastructure in HCL (HashiCorp Configuration Language), and Terraform creates or updates resources in cloud providers (AWS, Azure, GCP) or other APIs (Kubernetes, DNS, etc.) so the real world matches your configuration.
Why Terraform?
- Declarative: You define the desired state; Terraform figures out create/update/delete.
- Multi-cloud and multi-service: One tool and one language for clouds, Kubernetes, databases, and more via providers.
- State: Terraform keeps a state file of what it manages so it can drift detection and safe updates.
- Plan before apply:
terraform planshows the exact changes; you apply only after review.
Terraform architecture
Terraform has three main pieces: the core, providers, and state.
+--------------------------- TERRAFORM CORE (CLI) ---------------------------+
| +-------------+ +-------------+ +-------------+ +-----------------+ |
| | Config | | Graph | | State | | Provider | |
| | Loader |->| Builder |->| Manager |<-| Plugins | |
| | (.tf files) | | (dependency)| | (read/write)| | (AWS, Azure...) | |
| +-------------+ +-------------+ +------+------+ +--------+--------+ |
| | | | | |
| v v v v |
| +---------------------------------------------------------------------+ |
| | Execution: plan (state+config -> diff) or apply (call providers) | |
| +---------------------------------------------------------------------+ |
+----------------------------------------------------------------------------+
| | |
v v v
Your .tf files State (local file Cloud / API
(desired state) or remote backend) (actual resources)
Core components
| Component | Role |
|---|---|
| Config loader | Reads .tf and .tf.json files, resolves variables and modules, and builds an internal representation of the desired infrastructure. |
| Graph builder | Builds a dependency graph of resources (e.g. subnet before VM). Plan and apply execute in an order that respects dependencies. |
| State manager | Reads and writes state: a mapping of your config (resource addresses) to real resource IDs and attributes. Used for drift detection and update planning. |
| Execution engine | For plan: compares desired (config) vs current (state), and asks providers for refresh; produces a change set. For apply: invokes provider APIs to create/update/delete resources, then updates state. |
Providers
- Providers are plugins that talk to a specific API (e.g. AWS, Azure, Kubernetes). Each resource type (e.g.
aws_instance,azurerm_resource_group) belongs to a provider. - Terraform downloads providers at
terraform initand uses them during plan and apply. - You declare which providers and versions you need; the core stays small and generic; all API logic lives in providers.
State
- State holds: resource address → provider’s resource ID and stored attributes. Terraform uses it to know what it already created and what to change or destroy.
- Backend is where state lives: local (a file on disk) or remote (e.g. S3 + DynamoDB, Azure Storage, Terraform Cloud). Remote backends enable team use, locking, and security (no state on laptops).
How Terraform works
1. Write configuration (.tf)
You define resources, data sources, variables, and outputs in HCL:
# main.tf
terraform {
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}
}
provider "aws" {
region = var.aws_region
}
resource "aws_s3_bucket" "app" {
bucket = "${var.project_name}-app-${var.environment}"
tags = {
Environment = var.environment
ManagedBy = "terraform"
}
}
2. terraform init
- Downloads the provider plugins (and module sources if you use modules).
- Initializes the backend (e.g. configures remote state).
- Run once per working directory (and after adding providers or changing backend).
terraform init
3. terraform plan
- Refreshes state: Asks providers to update state with the current reality (optional but default).
- Plans: Compares config (desired) with state (current) and computes create, update, or destroy actions.
- Output: Human-readable plan; no changes are made. Use this to review before apply.
terraform plan -out=tfplan
4. terraform apply
- Runs the planned changes: calls provider APIs to create/update/delete resources.
- Updates state after each successful change.
- With
-auto-approveit skips the confirmation prompt (useful in CI); otherwise Terraform asks for confirmation.
terraform apply tfplan
# or
terraform apply
5. terraform destroy
- Plans and applies the destruction of all resources in state (reverse order of dependencies). Use with care; often you want to destroy only a subset (target or separate state).
End-to-end flow
- You edit
.tf(desired state). - terraform init ensures providers and backend are ready.
- terraform plan refreshes state, diffs desired vs current, outputs the change set.
- terraform apply executes the change set via providers and writes new state.
- State is the source of truth for “what Terraform manages”; real infrastructure is the source of truth for “what actually exists.” Terraform reconciles the two.
Best practices
1. State management
- Use a remote backend in production (e.g. S3 + DynamoDB for locking, Azure Storage, GCS, or Terraform Cloud). Avoid long-lived local state.
- Enable state locking so two runs never apply at the same time (DynamoDB for S3 backend, etc.).
- Never commit state with secrets or sensitive data to Git. Prefer remote state and restrict access with IAM.
- Separate state per environment or layer (e.g. one state for networking, one for compute) to limit blast radius and allow different lifecycles.
# backend.tf (example: S3 + DynamoDB for AWS)
terraform {
backend "s3" {
bucket = "my-company-terraform-state"
key = "prod/app/terraform.tfstate"
region = "us-east-1"
dynamodb_table = "terraform-locks"
encrypt = true
}
}
2. Use modules
- Modules are reusable bundles of resources (e.g. “VPC module,” “ECS service module”). Use them to avoid copy-paste and to standardize patterns.
- Compose environments from the same module with different variables (e.g. dev vs prod).
- Prefer HashiCorp and community modules where they fit (e.g.
terraform-aws-modules/vpc/aws) and wrap or extend them if needed.
module "vpc" {
source = "terraform-aws-modules/vpc/aws"
version = "5.0.0"
name = "${var.project_name}-vpc"
cidr = var.vpc_cidr
azs = var.azs
private_subnets = var.private_subnet_cidrs
public_subnets = var.public_subnet_cidrs
enable_nat_gateway = true
single_nat_gateway = var.environment != "prod"
}
3. Pin provider and Terraform versions
- In the terraform block, pin required_providers (and versions) so everyone and CI use the same provider behavior.
- Pin Terraform version if you use CI or multiple people (e.g.
required_version = ">= 1.5.0").
terraform {
required_version = ">= 1.5.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}
}
4. Variables and outputs
- Use variables for environment-specific or sensitive values (region, env name, instance type). Give defaults where it makes sense; use variable validation to fail fast on bad input.
- Use outputs to expose IDs, ARNs, and endpoints to other Terraform configs (via remote state) or to CI/CD.
- Sensitive variables: Mark as
sensitive = trueso they are redacted in plan/apply log and in state; store values in env vars or a secret manager, not in.tffiles.
5. Naming and structure
- Resource names: Use a consistent scheme (e.g.
{project}-{env}-{resource}) and tags (Environment, Project, ManagedBy) so you can identify and govern resources. - File layout: Split by concern (e.g.
main.tf,variables.tf,outputs.tf,versions.tf,backend.tf) or by layer (e.g.network.tf,compute.tf). Keep modules in amodules/directory.
6. Security (overview)
- No secrets in code: Use
TF_VAR_*or a secret store; mark variables assensitive. - Least privilege: Run Terraform (and CI) with IAM/roles that have only the permissions needed for the resources you manage.
- Private state: Store state in a bucket/container with encryption and access controls; use a private backend or VPC endpoints where possible.
- Review plans: In CI, run
terraform planand require approval beforeapply; consider policy as code (e.g. Sentinel, OPA) for guardrails.
See Advanced: Security and hardening and Secure Terraform code examples below for details and ready-to-use patterns.
7. Workspaces vs separate state
- Workspaces (e.g.
default,dev,prod) use one backend with different state keys. They are simple but can be confusing; naming and discipline matter. - Separate directories or repos per environment with their own state (and backend config) give clear separation and are often easier to reason about for prod vs non-prod.
8. Plan and apply in CI
- Run
terraform fmt -checkandterraform validatein CI. - Run
terraform planon every change and store the plan artifact; apply only after review (or from a protected branch). Use target or -destroy sparingly and with explicit approval.
Advanced: Security and hardening
Secrets management
| Practice | Description |
|---|---|
| Never commit secrets | No passwords, API keys, or tokens in .tf, .tfvars, or state. Use environment variables (TF_VAR_*), a secret manager (HashiCorp Vault, AWS Secrets Manager), or CI secrets. |
| Mark sensitive variables | Set sensitive = true on variables and outputs so Terraform redacts them in logs and in terraform plan output. |
| Sensitive in state | State can contain sensitive values (e.g. DB password in a resource attribute). Always use a remote backend with encryption and strict access control; never commit state. |
| Provider credentials | Prefer IAM roles (e.g. EC2 instance profile, OIDC in CI) over long-lived access keys. If you use keys, inject via env vars, not files in the repo. |
State security
- Remote backend only in production: Use S3, Azure Storage, GCS, or Terraform Cloud with encryption at rest.
- State locking: Prevent concurrent apply (e.g. DynamoDB for S3 backend). Reduces risk of state corruption and conflicting changes.
- Access control: Restrict who can read/write state (IAM, RBAC). Prefer separate state per environment and least-privilege roles for CI.
- Encryption: Enable server-side encryption on the state bucket/container; use KMS where available for audit and key control.
- Private access: Use VPC endpoints or private connectivity to the state backend so state does not traverse the public internet.
Least privilege for Terraform execution
- Scoped IAM: The identity that runs
terraform apply(user or CI role) should have only the permissions required to create/update/delete the resources in your config. Avoid broad policies like*:*on a whole service. - Policy documents in Terraform: Define IAM policies and roles in Terraform and attach them to the runner; use conditions (e.g.
aws:RequestedRegion,aws:SourceAccount) to tighten scope. - Separate roles per stack: Use different roles for dev vs prod so a compromise or mistake in dev does not affect production.
Supply chain and integrity
- Provider version pinning: Pin
required_providerswith a version constraint (e.g.~> 5.0) and runterraform init -upgradein a controlled way. Use.terraform.lock.hcland commit it so everyone uses the same provider binaries. - Module sources: Prefer modules from trusted registries (HashiCorp, official); pin version or commit. For private modules, use a private registry or tagged Git refs.
- Verification: Terraform can verify provider checksums (in the lock file). In locked-down environments, use a private provider mirror and verify hashes.
Policy as code and scanning
- Pre-apply checks: Use tools like tfsec, Checkov, or Trivy to scan
.tffor misconfigurations (e.g. public S3 buckets, open security groups, unencrypted storage). - Terraform Cloud / Enterprise: Sentinel policies can enforce rules (e.g. “no instance without a tag”, “only approved instance types”) before an apply.
- OPA (Open Policy Agent): Integrate with CI to evaluate Terraform plan or state against custom policies (e.g. “no new public IPs in prod”).
Secure resource defaults
- Encryption: Enable encryption at rest (S3, EBS, RDS, etc.) and in transit (TLS) by default in your modules and examples.
- Networking: Prefer private subnets and security groups that allow only necessary ingress/egress; avoid
0.0.0.0/0unless required and documented. - Logging and auditing: Enable CloudTrail, flow logs, and resource-level logging where relevant so you can audit changes and investigate incidents.
Secure Terraform code examples
The following snippets show patterns to enhance security in your Terraform source.
1. Sensitive variables and validation
# variables.tf
variable "db_password" {
description = "Database master password"
type = string
sensitive = true
validation {
condition = length(var.db_password) >= 16
error_message = "Password must be at least 16 characters."
}
}
variable "environment" {
type = string
description = "Environment name (dev, staging, prod)"
validation {
condition = contains(["dev", "staging", "prod"], var.environment)
error_message = "Environment must be dev, staging, or prod."
}
}
- Use sensitive = true so the value is never printed in plan/apply or in logs.
- Use validation blocks to fail fast on invalid or dangerous values (e.g. prod safeguards).
2. Remote backend with encryption and locking
# backend.tf
terraform {
backend "s3" {
bucket = "my-org-terraform-state"
key = "prod/network/terraform.tfstate"
region = "ap-southeast-1"
dynamodb_table = "terraform-state-lock"
encrypt = true
kms_key_id = "arn:aws:kms:ap-southeast-1:123456789012:key/..."
}
}
- encrypt = true enables server-side encryption; kms_key_id uses a customer-managed key for audit and control.
- dynamodb_table enables state locking so concurrent applies are blocked.
3. No hardcoded credentials; use env or data source
# Bad: never do this
# provider "aws" {
# access_key = "AKIA..."
# secret_key = "..."
# }
# Good: credentials from environment (or IAM role if on EC2/ECS/Lambda)
provider "aws" {
region = var.aws_region
# access_key and secret_key omitted: use AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, or IAM role
}
# Good: fetch existing secret from a secret manager instead of passing raw values
data "aws_secretsmanager_secret_version" "db" {
secret_id = var.db_secret_arn
}
locals {
db_credentials = jsondecode(data.aws_secretsmanager_secret_version.db.secret_string)
}
- Provider credentials from env or instance/profile; never in
.tfor.tfvarscommitted to Git. - Use data sources (e.g. Secrets Manager, SSM Parameter Store) to pull secrets at apply time instead of
TF_VAR_when the secret already lives in a secure store.
4. Secure S3 bucket (encryption, versioning, block public access)
resource "aws_s3_bucket" "app_data" {
bucket = "${var.project_name}-${var.environment}-data"
tags = {
Environment = var.environment
ManagedBy = "terraform"
}
}
resource "aws_s3_bucket_versioning" "app_data" {
bucket = aws_s3_bucket.app_data.id
versioning_configuration {
status = "Enabled"
}
}
resource "aws_kms_key" "s3" {
description = "KMS key for ${var.project_name} S3 bucket encryption"
deletion_window_in_days = 10
enable_key_rotation = true
}
resource "aws_s3_bucket_server_side_encryption_configuration" "app_data" {
bucket = aws_s3_bucket.app_data.id
rule {
apply_server_side_encryption_by_default {
sse_algorithm = "aws:kms"
kms_master_key_id = aws_kms_key.s3.arn
}
bucket_key_enabled = true
}
}
resource "aws_s3_bucket_public_access_block" "app_data" {
bucket = aws_s3_bucket.app_data.id
block_public_acls = true
block_public_policy = true
ignore_public_acls = true
restrict_public_buckets = true
}
- Versioning for recovery; KMS encryption for at-rest security; public_access_block to prevent accidental public exposure.
5. Restrictive security group (least privilege)
resource "aws_security_group" "app" {
name_prefix = "${var.project_name}-app-"
vpc_id = module.vpc.vpc_id
description = "Application tier; allow only from ALB and required egress"
ingress {
description = "HTTPS from ALB"
from_port = 443
to_port = 443
protocol = "tcp"
security_groups = [aws_security_group.alb.id]
}
egress {
description = "HTTPS to internet (e.g. APIs)"
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
lifecycle {
create_before_destroy = true
}
tags = {
Name = "${var.project_name}-app"
}
}
- Ingress only from the ALB security group (no 0.0.0.0/0); egress limited to what the app needs (here HTTPS). Tighten egress further (e.g. VPC endpoints, specific prefixes) where possible.
6. Outputs that must stay secret
output "db_endpoint" {
value = aws_db_instance.main.endpoint
description = "Database endpoint"
}
output "db_password" {
value = aws_db_instance.main.password
sensitive = true
description = "Database password (redacted in logs)"
}
- Mark any output that could contain secrets as sensitive = true so it is never printed in logs or in the plan.
7. Version and lock file (reproducible, verifiable runs)
# versions.tf
terraform {
required_version = ">= 1.5.0, < 2.0.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}
}
- Pin required_version and required_providers; run terraform init and commit .terraform.lock.hcl so all environments and CI use the same provider versions and checksums.
Summary
| Topic | Takeaway |
|---|---|
| Architecture | Core (config, graph, state, execution) + providers (plugins) + state (mapping from config to real IDs). |
| Flow | init → plan (diff desired vs state) → apply (call providers, update state). State is the link between config and real resources. |
| Best practices | Remote state with locking; modules; version pinning; variables/outputs and sensitive handling; consistent naming and tags; no secrets in code; review plans and least-privilege IAM. |
| Security | No secrets in code; sensitive variables and outputs; encrypted remote state and locking; least-privilege IAM for Terraform; supply chain (pinned providers, lock file); policy/scanning (tfsec, Sentinel, OPA); secure defaults (encryption, restrictive security groups). |
| Secure code | Use validation and sensitive on variables; backend with encryption and DynamoDB lock; credentials from env or IAM; data sources for secrets; S3 encryption and public access block; restrictive security groups; sensitive outputs; pin versions and commit lock file. |
Terraform lets you manage cloud and other APIs declaratively. Combining its architecture and workflow with security best practices and secure Terraform code patterns (secrets handling, state protection, least privilege, and scanning) keeps infrastructure safe and maintainable as you scale.
Comments