quyennv.com

Senior DevOps Engineer · Healthcare, Singapore

Kubernetes (K8S): Architecture, Pods, Deployments, and Security

#kubernetes#k8s#containers#orchestration#devops#cloud

Kubernetes (K8s) is an open-source system for automating deployment, scaling, and management of containerized applications. You describe the desired state (e.g. “run 3 replicas of this image”), and Kubernetes keeps the cluster in that state.

Why Kubernetes?

  • Orchestration: Schedule and run containers across many nodes; handle restarts and placement.
  • Scaling: Scale workloads up or down (manually or with autoscalers).
  • Self-healing: Restart failed containers, replace unhealthy pods, reschedule when nodes fail.
  • Declarative config: Define desired state in YAML; Kubernetes reconciles the actual state.

Kubernetes architecture

A Kubernetes cluster is split into two planes: the control plane (manages the cluster) and the data plane (runs your workloads).

High-level view

+----------------------------- CONTROL PLANE -----------------------------+
|  +-------------+  +----------+  +---------------------+                 |
|  | API Server  |  | Scheduler|  | Controller Manager  |                 |
|  +------+------+  +-----+----+  +----------+----------+                 |
|         |                 |                  |                          |
|         |           +-----+-----+            |                          |
|         +---------->|   etcd    |<-----------+                          |
|                     | (storage) |                                       |
+---------------------+-----+-----+---------------------------------------+
                           |
+---------------------------- DATA PLANE (Nodes) --------------------------+
|              +------------------+------------------+                     |
|              v                                    v                      |
|  +------------------+                  +------------------+              |
|  |     Node 1       |                  |     Node 2       |   ...        |
|  |  kubelet         |                  |  kubelet         |              |
|  |  kube-proxy      |                  |  kube-proxy      |              |
|  |  container runtime                  |  container runtime              |
|  |  [Pods]          |                  |  [Pods]          |              |
|  +------------------+                  +------------------+              |
+--------------------------------------------------------------------------+

Control plane components

ComponentRole
API Server (kube-apiserver)Single entrypoint for all cluster operations. Validates and processes REST requests; updates etcd. kubectl and other clients talk only to the API server.
etcdDistributed key-value store holding cluster state (desired and current). Only the API server reads/writes etcd. High availability is critical for production.
Scheduler (kube-scheduler)Watches for newly created pods with no assigned node; selects a node (based on resources, affinity, taints/tolerations) and assigns the pod.
Controller Manager (kube-controller-manager)Runs controllers that reconcile state: Node Controller, Deployment Controller, ReplicaSet Controller, etc. They watch the API and drive the cluster toward the desired state.
Cloud Controller ManagerOptional; ties the cluster to cloud provider APIs (load balancers, nodes, routes). Used on AKS, EKS, GKE.

Data plane (worker nodes)

ComponentRole
kubeletAgent on each node. Registers the node with the API server; ensures containers in pods are running (pulls images, starts/stops containers, reports status).
kube-proxyNetwork proxy on each node. Implements Service abstraction: maintains iptables or IPVS rules so traffic to a Service IP/port is forwarded to backend pods.
Container runtimeSoftware that runs containers (containerd, CRI-O, etc.). kubelet talks to it via the Container Runtime Interface (CRI).

Request flow (example: create a Deployment)

  1. You run kubectl apply -f deployment.yaml → kubectl sends the manifest to the API Server.
  2. API Server validates and stores the Deployment (and derived ReplicaSet) in etcd.
  3. Deployment controller (in Controller Manager) sees the new ReplicaSet and creates Pod objects (no node yet).
  4. Scheduler sees Pods with no nodeName, selects nodes, and updates each Pod with the chosen node (write to etcd via API Server).
  5. kubelet on each assigned node sees new Pods, pulls images via the container runtime, and starts containers.
  6. kubelet reports Pod status back to the API Server; controllers and users see the cluster state.

Core concepts

TermMeaning
PodSmallest deployable unit: one or more containers that share storage and network.
DeploymentDeclarative way to manage a set of identical pods (replicas, rolling updates).
ServiceStable network endpoint to reach pods (cluster IP, NodePort, or LoadBalancer).
NamespaceVirtual cluster for grouping and isolating resources (e.g. dev, prod).
NodeA worker machine (VM or physical) that runs pods.

Kubernetes resources overview

The following table summarizes the main Kubernetes resources (as in Kubernetes in Action, Lukša). Cluster-level resources are not namespaced; others live in a namespace.

Resource (abbr.)API versionDescription
Namespace (ns)v1Organizes resources into non-overlapping groups (e.g. per tenant, env).
Pod (po)v1Basic deployable unit: one or more co-located containers sharing network and storage.
ReplicaSet (rs)apps/v1Keeps a set of pod replicas running; used by Deployment.
ReplicationController (rc)v1Older, less capable way to keep pod replicas; prefer ReplicaSet.
Deployment (deploy)apps/v1Declarative deployment and rolling updates of pods via ReplicaSet.
StatefulSet (sts)apps/v1Manages stateful pods with stable identity and ordered deployment.
DaemonSet (ds)apps/v1Runs one pod replica per node (all nodes or those matching a selector).
Jobbatch/v1Runs pods until a completable task succeeds (one or more pods).
CronJobbatch/v1Runs a Job on a schedule (cron expression).
Service (svc)v1Exposes one or more pods at a stable IP and port (ClusterIP, NodePort, LoadBalancer).
Endpoints (ep)v1Lists the pod IPs that back a Service (usually auto-managed).
Ingress (ing)networking.k8s.io/v1Exposes services to the outside via HTTP(S) host/path routing.
ConfigMap (cm)v1Key-value config for apps (non-sensitive); mount as files or env.
Secretv1Sensitive data (passwords, tokens); base64, use encryption at rest.
PersistentVolume (pv)v1Cluster-level piece of storage; bound by a PersistentVolumeClaim.
PersistentVolumeClaim (pvc)v1Request for storage; bound to a PersistentVolume or dynamic provisioner.
StorageClass (sc)storage.k8s.io/v1Defines a class of storage for dynamic provisioning of PVCs.

Pods in more detail

  • Lifecycle phases: Pending → Running (or Succeeded/Failed for one-off pods). A pod is Pending until scheduled and until at least one container has started.
  • Init containers: Run to completion before the main containers start. Use them for setup (e.g. migrate DB, wait for a dependency). They run in order; if one fails, the pod is restarted (according to restartPolicy).
  • Multiple containers in a pod: Share the same network namespace (localhost) and can share volumes. Typical pattern: main app + sidecar (e.g. log shipper, proxy). The kubelet restarts the whole pod if any container exits (with restartPolicy OnFailure or Always).
spec:
  initContainers:
    - name: init-db
      image: busybox
      command: ['sh', '-c', 'until nslookup db; do sleep 2; done']
  containers:
    - name: app
      image: my-app:latest

Workload resources: ReplicaSet, Job, DaemonSet, StatefulSet

ResourceUse case
ReplicaSetKeep N identical pod replicas; use via Deployment, not alone.
JobRun a batch task until success (e.g. backup, migration). completions, parallelism, backoffLimit.
CronJobRun a Job on a schedule (e.g. "0 * * * *" every hour).
DaemonSetOne pod per node (e.g. node exporter, log collector, CNI).
StatefulSetStateful apps with stable identity: stable pod name and storage, ordered create/delete.

Services and networking

  • ClusterIP: Default. A virtual IP inside the cluster; pods reach the service by name (DNS: <svc>.<ns>.svc.cluster.local). Endpoints are created automatically from the service selector and list the backing pod IPs.
  • NodePort: Exposes the service on each node’s IP at a static port (30000–32767). Good for dev or when you don’t have a load balancer.
  • LoadBalancer: Provisions an external load balancer (cloud or on-prem). Often used with Ingress for HTTP(S).
  • Headless service: clusterIP: None. No cluster IP; DNS returns all pod IPs. Used for StatefulSet or when clients need to talk to specific pods.

Ingress exposes HTTP(S) routes to services. An Ingress controller (e.g. NGINX, Traefik) watches Ingress resources and configures the load balancer. One Ingress can route multiple hosts/paths to different ClusterIP services.

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: my-ingress
spec:
  rules:
    - host: app.example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: my-app-svc
                port:
                  number: 80

ConfigMap and Secret (configuration)

  • ConfigMap: Store non-sensitive config (URLs, feature flags, config files). Mount as a volume or inject as environment variables. Changes to the volume may be reflected in the pod depending on sync settings.
  • Secret: Same idea for sensitive data (passwords, TLS certs). Stored base64; enable encryption at rest for the API server in production. Mount as a volume or env; prefer projected volumes or external secret operators for rotation.

Volumes and persistent storage

  • emptyDir: Temporary directory per pod; deleted when the pod is removed. Good for scratch space or sharing data between containers in a pod.
  • PersistentVolumeClaim (PVC): Request storage (size, StorageClass). The cluster binds it to a PersistentVolume (PV) or triggers dynamic provisioning. Pods mount the PVC; data survives pod restarts.
  • StorageClass: Defines a provisioner and parameters (e.g. cloud disk type). When you create a PVC that references a StorageClass, the provisioner creates the backing volume and binds the PVC.

Minimal Deployment example

Save as app-deployment.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
  labels:
    app: my-app
spec:
  replicas: 2
  selector:
    matchLabels:
      app: my-app
  template:
    metadata:
      labels:
        app: my-app
    spec:
      containers:
        - name: app
          image: my-registry.io/my-app:latest
          ports:
            - containerPort: 3000
          resources:
            requests:
              memory: "64Mi"
              cpu: "100m"
            limits:
              memory: "128Mi"
              cpu: "200m"

Create the deployment:

kubectl apply -f app-deployment.yaml

Exposing the app with a Service

apiVersion: v1
kind: Service
metadata:
  name: my-app-svc
spec:
  selector:
    app: my-app
  ports:
    - port: 80
      targetPort: 3000
  type: ClusterIP   # or NodePort / LoadBalancer
kubectl apply -f app-service.yaml

Essential kubectl commands

# List pods (default namespace)
kubectl get pods

# List pods in a namespace
kubectl get pods -n production

# Describe a pod (events, state, details)
kubectl describe pod <pod-name>

# View logs from a pod
kubectl logs <pod-name>

# Follow logs (like tail -f)
kubectl logs -f <pod-name>

# Execute a command in a pod
kubectl exec -it <pod-name> -- sh

# List deployments
kubectl get deployments

# Scale a deployment
kubectl scale deployment my-app --replicas=5

# Delete a deployment and its pods
kubectl delete deployment my-app

Pod lifecycle and restarts

  • Kubernetes keeps the number of replicas you specified; if a pod exits or fails, it is replaced.
  • livenessProbe and readinessProbe tell Kubernetes when to restart a pod or when to send traffic:
containers:
  - name: app
    image: my-app:latest
    livenessProbe:
      httpGet:
        path: /health
        port: 3000
      initialDelaySeconds: 5
      periodSeconds: 10
    readinessProbe:
      httpGet:
        path: /ready
        port: 3000
      initialDelaySeconds: 2
      periodSeconds: 5

Namespaces

# List namespaces
kubectl get namespaces

# Create a namespace
kubectl create namespace staging

# Run a one-off pod in a namespace
kubectl run debug --image=busybox -n staging -- sleep 3600

Security for Kubernetes

Securing a Kubernetes cluster involves the control plane, the nodes, the network, and the workloads. Below are the main areas and practices.

1. RBAC (Role-Based Access Control)

RBAC controls who can do what in the cluster (e.g. list pods, create deployments). Define Roles (or ClusterRoles) and RoleBindings (or ClusterRoleBindings) to grant permissions to users, groups, or service accounts.

# Example: Role that allows reading pods in a namespace
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: production
  name: pod-reader
rules:
  - apiGroups: [""]
    resources: ["pods"]
    verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: read-pods
  namespace: production
subjects:
  - kind: ServiceAccount
    name: ci-bot
    namespace: production
roleRef:
  kind: Role
  name: pod-reader
  apiGroup: rbac.authorization.k8s.io
  • Principle of least privilege: Grant only the permissions needed.
  • Prefer namespaced Roles/RoleBindings; use ClusterRole/ClusterRoleBinding only for cluster-wide access (e.g. admin, node viewer).

2. Secrets management

  • Kubernetes Secrets store sensitive data (passwords, tokens, TLS certs) as base64; they are not encrypted at rest by default. Enable encryption at rest for the API server (e.g. with a KMS provider) in production.
  • Avoid putting secrets in plain YAML in Git. Use external secret managers (e.g. HashiCorp Vault, AWS Secrets Manager, Azure Key Vault) with operators (e.g. External Secrets Operator) to sync into Kubernetes Secrets.
  • Prefer projected volumes or CSI secret stores so pods get only the secrets they need.
# Mount a secret as a file in a pod
spec:
  containers:
    - name: app
      volumeMounts:
        - name: db-secret
          mountPath: /etc/secrets
          readOnly: true
  volumes:
    - name: db-secret
      secret:
        secretName: db-credentials

3. Network policies

By default, pods in a cluster can often talk to any other pod. NetworkPolicy restricts ingress/egress traffic (e.g. only allow frontend → backend, block cross-namespace traffic).

# Allow only pods with label role=frontend to reach pods with label app=api on port 8080
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: api-allow-frontend
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: api
  policyTypes:
    - Ingress
  ingress:
    - from:
        - podSelector:
            matchLabels:
              role: frontend
      ports:
        - protocol: TCP
          port: 8080
  • Enforcing NetworkPolicy requires a CNI plugin that supports it (e.g. Calico, Cilium).
  • Start with deny-by-default or explicit allow lists for critical namespaces.

4. Pod security (security context and Pod Security Standards)

  • Security context on pods/containers: run as non-root user (runAsNonRoot, runAsUser), drop capabilities (securityContext.capabilities.drop: ["ALL"]), read-only root filesystem where possible.
  • Pod Security Standards (PSS): Privileged, Baseline, Restricted. Enforce via Pod Security Admission (labels on namespaces) or a policy engine (e.g. OPA Gatekeeper, Kyverno).
# Example: restricted-style pod
spec:
  securityContext:
    runAsNonRoot: true
    runAsUser: 1000
    seccompProfile:
      type: RuntimeDefault
  containers:
    - name: app
      securityContext:
        allowPrivilegeEscalation: false
        capabilities:
          drop: ["ALL"]
        readOnlyRootFilesystem: true

5. Image security

  • Use private or trusted registries; avoid latest tag in production.
  • Image scanning (e.g. Trivy, Snyk) in CI and at admission (e.g. Trivy admission controller, Gatekeeper) to block vulnerable images.
  • Image signing and verification: use Cosign and policy-controller (or similar) so only signed images are allowed.

6. Control plane and node hardening

  • API server: Restrict access (firewall, private endpoints); enable audit logging; use admission controllers (e.g. PodSecurity, validating webhooks) to enforce policies.
  • etcd: Encrypt at rest; restrict network access to API server only.
  • Nodes: Keep OS and kubelet/runtime updated; use node hardening (CIS benchmarks); consider read-only root filesystem and minimal images for the host where possible.
  • kubelet: Configure anonymousAuth: false; use NodeRestriction admission to limit what kubelets can do.

7. Summary: security checklist

AreaPractices
AccessRBAC with least privilege; avoid cluster-admin in production.
SecretsEncryption at rest; external secret manager; minimal exposure to pods.
NetworkNetworkPolicy for segmentation; restrict egress where possible.
WorkloadsNon-root, drop capabilities, read-only root; enforce PSS (Baseline/Restricted).
ImagesScan in CI and at admission; sign and verify images.
ClusterHarden API server, etcd, and nodes; audit logs and admission control.

Summary

  • Architecture: Control plane (API server, etcd, scheduler, controllers) manages the cluster; data plane (kubelet, kube-proxy, container runtime) runs pods on nodes.
  • Pods run your containers (init containers, multi-container pods); Deployments manage replicas and rolling updates via ReplicaSet; Services (ClusterIP, NodePort, LoadBalancer, headless) and Ingress expose pods on the network.
  • Workloads: Use Job/CronJob for batch/scheduled tasks, DaemonSet for one pod per node, StatefulSet for stateful apps with stable identity and storage.
  • Config: ConfigMap and Secret inject configuration; PersistentVolumeClaim and StorageClass provide persistent storage.
  • Use kubectl to apply YAML and inspect pods, deployments, services, and other resources.
  • Security: Use RBAC, NetworkPolicy, Secrets (with encryption), pod security contexts and PSS, image scanning, and control-plane/node hardening for production.

For production, you typically add ConfigMaps, Secrets, Ingress, and a cluster (e.g. AKS, EKS, GKE) or a local setup like minikube/kind for learning.

References

  • Kubernetes in Action (2nd ed.), Marko Lukša, Manning — comprehensive coverage of Pods, ReplicaSet, Deployment, Service, Endpoints, Ingress, ConfigMap, Secret, PV/PVC, StorageClass, Job, CronJob, DaemonSet, StatefulSet, and more.

← All posts

Comments