Kubernetes (K8S): Architecture, Pods, Deployments, and Security
#kubernetes#k8s#containers#orchestration#devops#cloud
Kubernetes (K8s) is an open-source system for automating deployment, scaling, and management of containerized applications. You describe the desired state (e.g. “run 3 replicas of this image”), and Kubernetes keeps the cluster in that state.
Why Kubernetes?
- Orchestration: Schedule and run containers across many nodes; handle restarts and placement.
- Scaling: Scale workloads up or down (manually or with autoscalers).
- Self-healing: Restart failed containers, replace unhealthy pods, reschedule when nodes fail.
- Declarative config: Define desired state in YAML; Kubernetes reconciles the actual state.
Kubernetes architecture
A Kubernetes cluster is split into two planes: the control plane (manages the cluster) and the data plane (runs your workloads).
High-level view
+----------------------------- CONTROL PLANE -----------------------------+
| +-------------+ +----------+ +---------------------+ |
| | API Server | | Scheduler| | Controller Manager | |
| +------+------+ +-----+----+ +----------+----------+ |
| | | | |
| | +-----+-----+ | |
| +---------->| etcd |<-----------+ |
| | (storage) | |
+---------------------+-----+-----+---------------------------------------+
|
+---------------------------- DATA PLANE (Nodes) --------------------------+
| +------------------+------------------+ |
| v v |
| +------------------+ +------------------+ |
| | Node 1 | | Node 2 | ... |
| | kubelet | | kubelet | |
| | kube-proxy | | kube-proxy | |
| | container runtime | container runtime |
| | [Pods] | | [Pods] | |
| +------------------+ +------------------+ |
+--------------------------------------------------------------------------+
Control plane components
| Component | Role |
|---|---|
| API Server (kube-apiserver) | Single entrypoint for all cluster operations. Validates and processes REST requests; updates etcd. kubectl and other clients talk only to the API server. |
| etcd | Distributed key-value store holding cluster state (desired and current). Only the API server reads/writes etcd. High availability is critical for production. |
| Scheduler (kube-scheduler) | Watches for newly created pods with no assigned node; selects a node (based on resources, affinity, taints/tolerations) and assigns the pod. |
| Controller Manager (kube-controller-manager) | Runs controllers that reconcile state: Node Controller, Deployment Controller, ReplicaSet Controller, etc. They watch the API and drive the cluster toward the desired state. |
| Cloud Controller Manager | Optional; ties the cluster to cloud provider APIs (load balancers, nodes, routes). Used on AKS, EKS, GKE. |
Data plane (worker nodes)
| Component | Role |
|---|---|
| kubelet | Agent on each node. Registers the node with the API server; ensures containers in pods are running (pulls images, starts/stops containers, reports status). |
| kube-proxy | Network proxy on each node. Implements Service abstraction: maintains iptables or IPVS rules so traffic to a Service IP/port is forwarded to backend pods. |
| Container runtime | Software that runs containers (containerd, CRI-O, etc.). kubelet talks to it via the Container Runtime Interface (CRI). |
Request flow (example: create a Deployment)
- You run
kubectl apply -f deployment.yaml→ kubectl sends the manifest to the API Server. - API Server validates and stores the Deployment (and derived ReplicaSet) in etcd.
- Deployment controller (in Controller Manager) sees the new ReplicaSet and creates Pod objects (no node yet).
- Scheduler sees Pods with no
nodeName, selects nodes, and updates each Pod with the chosen node (write to etcd via API Server). - kubelet on each assigned node sees new Pods, pulls images via the container runtime, and starts containers.
- kubelet reports Pod status back to the API Server; controllers and users see the cluster state.
Core concepts
| Term | Meaning |
|---|---|
| Pod | Smallest deployable unit: one or more containers that share storage and network. |
| Deployment | Declarative way to manage a set of identical pods (replicas, rolling updates). |
| Service | Stable network endpoint to reach pods (cluster IP, NodePort, or LoadBalancer). |
| Namespace | Virtual cluster for grouping and isolating resources (e.g. dev, prod). |
| Node | A worker machine (VM or physical) that runs pods. |
Kubernetes resources overview
The following table summarizes the main Kubernetes resources (as in Kubernetes in Action, Lukša). Cluster-level resources are not namespaced; others live in a namespace.
| Resource (abbr.) | API version | Description |
|---|---|---|
| Namespace (ns) | v1 | Organizes resources into non-overlapping groups (e.g. per tenant, env). |
| Pod (po) | v1 | Basic deployable unit: one or more co-located containers sharing network and storage. |
| ReplicaSet (rs) | apps/v1 | Keeps a set of pod replicas running; used by Deployment. |
| ReplicationController (rc) | v1 | Older, less capable way to keep pod replicas; prefer ReplicaSet. |
| Deployment (deploy) | apps/v1 | Declarative deployment and rolling updates of pods via ReplicaSet. |
| StatefulSet (sts) | apps/v1 | Manages stateful pods with stable identity and ordered deployment. |
| DaemonSet (ds) | apps/v1 | Runs one pod replica per node (all nodes or those matching a selector). |
| Job | batch/v1 | Runs pods until a completable task succeeds (one or more pods). |
| CronJob | batch/v1 | Runs a Job on a schedule (cron expression). |
| Service (svc) | v1 | Exposes one or more pods at a stable IP and port (ClusterIP, NodePort, LoadBalancer). |
| Endpoints (ep) | v1 | Lists the pod IPs that back a Service (usually auto-managed). |
| Ingress (ing) | networking.k8s.io/v1 | Exposes services to the outside via HTTP(S) host/path routing. |
| ConfigMap (cm) | v1 | Key-value config for apps (non-sensitive); mount as files or env. |
| Secret | v1 | Sensitive data (passwords, tokens); base64, use encryption at rest. |
| PersistentVolume (pv) | v1 | Cluster-level piece of storage; bound by a PersistentVolumeClaim. |
| PersistentVolumeClaim (pvc) | v1 | Request for storage; bound to a PersistentVolume or dynamic provisioner. |
| StorageClass (sc) | storage.k8s.io/v1 | Defines a class of storage for dynamic provisioning of PVCs. |
Pods in more detail
- Lifecycle phases: Pending → Running (or Succeeded/Failed for one-off pods). A pod is Pending until scheduled and until at least one container has started.
- Init containers: Run to completion before the main containers start. Use them for setup (e.g. migrate DB, wait for a dependency). They run in order; if one fails, the pod is restarted (according to restartPolicy).
- Multiple containers in a pod: Share the same network namespace (localhost) and can share volumes. Typical pattern: main app + sidecar (e.g. log shipper, proxy). The kubelet restarts the whole pod if any container exits (with restartPolicy OnFailure or Always).
spec:
initContainers:
- name: init-db
image: busybox
command: ['sh', '-c', 'until nslookup db; do sleep 2; done']
containers:
- name: app
image: my-app:latest
Workload resources: ReplicaSet, Job, DaemonSet, StatefulSet
| Resource | Use case |
|---|---|
| ReplicaSet | Keep N identical pod replicas; use via Deployment, not alone. |
| Job | Run a batch task until success (e.g. backup, migration). completions, parallelism, backoffLimit. |
| CronJob | Run a Job on a schedule (e.g. "0 * * * *" every hour). |
| DaemonSet | One pod per node (e.g. node exporter, log collector, CNI). |
| StatefulSet | Stateful apps with stable identity: stable pod name and storage, ordered create/delete. |
Services and networking
- ClusterIP: Default. A virtual IP inside the cluster; pods reach the service by name (DNS:
<svc>.<ns>.svc.cluster.local). Endpoints are created automatically from the service selector and list the backing pod IPs. - NodePort: Exposes the service on each node’s IP at a static port (30000–32767). Good for dev or when you don’t have a load balancer.
- LoadBalancer: Provisions an external load balancer (cloud or on-prem). Often used with Ingress for HTTP(S).
- Headless service:
clusterIP: None. No cluster IP; DNS returns all pod IPs. Used for StatefulSet or when clients need to talk to specific pods.
Ingress exposes HTTP(S) routes to services. An Ingress controller (e.g. NGINX, Traefik) watches Ingress resources and configures the load balancer. One Ingress can route multiple hosts/paths to different ClusterIP services.
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: my-ingress
spec:
rules:
- host: app.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: my-app-svc
port:
number: 80
ConfigMap and Secret (configuration)
- ConfigMap: Store non-sensitive config (URLs, feature flags, config files). Mount as a volume or inject as environment variables. Changes to the volume may be reflected in the pod depending on sync settings.
- Secret: Same idea for sensitive data (passwords, TLS certs). Stored base64; enable encryption at rest for the API server in production. Mount as a volume or env; prefer projected volumes or external secret operators for rotation.
Volumes and persistent storage
- emptyDir: Temporary directory per pod; deleted when the pod is removed. Good for scratch space or sharing data between containers in a pod.
- PersistentVolumeClaim (PVC): Request storage (size, StorageClass). The cluster binds it to a PersistentVolume (PV) or triggers dynamic provisioning. Pods mount the PVC; data survives pod restarts.
- StorageClass: Defines a provisioner and parameters (e.g. cloud disk type). When you create a PVC that references a StorageClass, the provisioner creates the backing volume and binds the PVC.
Minimal Deployment example
Save as app-deployment.yaml:
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app
labels:
app: my-app
spec:
replicas: 2
selector:
matchLabels:
app: my-app
template:
metadata:
labels:
app: my-app
spec:
containers:
- name: app
image: my-registry.io/my-app:latest
ports:
- containerPort: 3000
resources:
requests:
memory: "64Mi"
cpu: "100m"
limits:
memory: "128Mi"
cpu: "200m"
Create the deployment:
kubectl apply -f app-deployment.yaml
Exposing the app with a Service
apiVersion: v1
kind: Service
metadata:
name: my-app-svc
spec:
selector:
app: my-app
ports:
- port: 80
targetPort: 3000
type: ClusterIP # or NodePort / LoadBalancer
kubectl apply -f app-service.yaml
Essential kubectl commands
# List pods (default namespace)
kubectl get pods
# List pods in a namespace
kubectl get pods -n production
# Describe a pod (events, state, details)
kubectl describe pod <pod-name>
# View logs from a pod
kubectl logs <pod-name>
# Follow logs (like tail -f)
kubectl logs -f <pod-name>
# Execute a command in a pod
kubectl exec -it <pod-name> -- sh
# List deployments
kubectl get deployments
# Scale a deployment
kubectl scale deployment my-app --replicas=5
# Delete a deployment and its pods
kubectl delete deployment my-app
Pod lifecycle and restarts
- Kubernetes keeps the number of replicas you specified; if a pod exits or fails, it is replaced.
- livenessProbe and readinessProbe tell Kubernetes when to restart a pod or when to send traffic:
containers:
- name: app
image: my-app:latest
livenessProbe:
httpGet:
path: /health
port: 3000
initialDelaySeconds: 5
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: 3000
initialDelaySeconds: 2
periodSeconds: 5
Namespaces
# List namespaces
kubectl get namespaces
# Create a namespace
kubectl create namespace staging
# Run a one-off pod in a namespace
kubectl run debug --image=busybox -n staging -- sleep 3600
Security for Kubernetes
Securing a Kubernetes cluster involves the control plane, the nodes, the network, and the workloads. Below are the main areas and practices.
1. RBAC (Role-Based Access Control)
RBAC controls who can do what in the cluster (e.g. list pods, create deployments). Define Roles (or ClusterRoles) and RoleBindings (or ClusterRoleBindings) to grant permissions to users, groups, or service accounts.
# Example: Role that allows reading pods in a namespace
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
namespace: production
name: pod-reader
rules:
- apiGroups: [""]
resources: ["pods"]
verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: read-pods
namespace: production
subjects:
- kind: ServiceAccount
name: ci-bot
namespace: production
roleRef:
kind: Role
name: pod-reader
apiGroup: rbac.authorization.k8s.io
- Principle of least privilege: Grant only the permissions needed.
- Prefer namespaced Roles/RoleBindings; use ClusterRole/ClusterRoleBinding only for cluster-wide access (e.g. admin, node viewer).
2. Secrets management
- Kubernetes Secrets store sensitive data (passwords, tokens, TLS certs) as base64; they are not encrypted at rest by default. Enable encryption at rest for the API server (e.g. with a KMS provider) in production.
- Avoid putting secrets in plain YAML in Git. Use external secret managers (e.g. HashiCorp Vault, AWS Secrets Manager, Azure Key Vault) with operators (e.g. External Secrets Operator) to sync into Kubernetes Secrets.
- Prefer projected volumes or CSI secret stores so pods get only the secrets they need.
# Mount a secret as a file in a pod
spec:
containers:
- name: app
volumeMounts:
- name: db-secret
mountPath: /etc/secrets
readOnly: true
volumes:
- name: db-secret
secret:
secretName: db-credentials
3. Network policies
By default, pods in a cluster can often talk to any other pod. NetworkPolicy restricts ingress/egress traffic (e.g. only allow frontend → backend, block cross-namespace traffic).
# Allow only pods with label role=frontend to reach pods with label app=api on port 8080
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: api-allow-frontend
namespace: production
spec:
podSelector:
matchLabels:
app: api
policyTypes:
- Ingress
ingress:
- from:
- podSelector:
matchLabels:
role: frontend
ports:
- protocol: TCP
port: 8080
- Enforcing NetworkPolicy requires a CNI plugin that supports it (e.g. Calico, Cilium).
- Start with deny-by-default or explicit allow lists for critical namespaces.
4. Pod security (security context and Pod Security Standards)
- Security context on pods/containers: run as non-root user (
runAsNonRoot,runAsUser), drop capabilities (securityContext.capabilities.drop: ["ALL"]), read-only root filesystem where possible. - Pod Security Standards (PSS): Privileged, Baseline, Restricted. Enforce via Pod Security Admission (labels on namespaces) or a policy engine (e.g. OPA Gatekeeper, Kyverno).
# Example: restricted-style pod
spec:
securityContext:
runAsNonRoot: true
runAsUser: 1000
seccompProfile:
type: RuntimeDefault
containers:
- name: app
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop: ["ALL"]
readOnlyRootFilesystem: true
5. Image security
- Use private or trusted registries; avoid
latesttag in production. - Image scanning (e.g. Trivy, Snyk) in CI and at admission (e.g. Trivy admission controller, Gatekeeper) to block vulnerable images.
- Image signing and verification: use Cosign and policy-controller (or similar) so only signed images are allowed.
6. Control plane and node hardening
- API server: Restrict access (firewall, private endpoints); enable audit logging; use admission controllers (e.g. PodSecurity, validating webhooks) to enforce policies.
- etcd: Encrypt at rest; restrict network access to API server only.
- Nodes: Keep OS and kubelet/runtime updated; use node hardening (CIS benchmarks); consider read-only root filesystem and minimal images for the host where possible.
- kubelet: Configure anonymousAuth: false; use NodeRestriction admission to limit what kubelets can do.
7. Summary: security checklist
| Area | Practices |
|---|---|
| Access | RBAC with least privilege; avoid cluster-admin in production. |
| Secrets | Encryption at rest; external secret manager; minimal exposure to pods. |
| Network | NetworkPolicy for segmentation; restrict egress where possible. |
| Workloads | Non-root, drop capabilities, read-only root; enforce PSS (Baseline/Restricted). |
| Images | Scan in CI and at admission; sign and verify images. |
| Cluster | Harden API server, etcd, and nodes; audit logs and admission control. |
Summary
- Architecture: Control plane (API server, etcd, scheduler, controllers) manages the cluster; data plane (kubelet, kube-proxy, container runtime) runs pods on nodes.
- Pods run your containers (init containers, multi-container pods); Deployments manage replicas and rolling updates via ReplicaSet; Services (ClusterIP, NodePort, LoadBalancer, headless) and Ingress expose pods on the network.
- Workloads: Use Job/CronJob for batch/scheduled tasks, DaemonSet for one pod per node, StatefulSet for stateful apps with stable identity and storage.
- Config: ConfigMap and Secret inject configuration; PersistentVolumeClaim and StorageClass provide persistent storage.
- Use kubectl to apply YAML and inspect pods, deployments, services, and other resources.
- Security: Use RBAC, NetworkPolicy, Secrets (with encryption), pod security contexts and PSS, image scanning, and control-plane/node hardening for production.
For production, you typically add ConfigMaps, Secrets, Ingress, and a cluster (e.g. AKS, EKS, GKE) or a local setup like minikube/kind for learning.
References
- Kubernetes in Action (2nd ed.), Marko Lukša, Manning — comprehensive coverage of Pods, ReplicaSet, Deployment, Service, Endpoints, Ingress, ConfigMap, Secret, PV/PVC, StorageClass, Job, CronJob, DaemonSet, StatefulSet, and more.
Comments