quyennv.com

Senior DevOps Engineer · Healthcare, Fanance

Detecting…

Self-Hosted CI/CD Agent with AKS and Azure DevOps – Part 3: Kaniko in Detail

#cicd#azure-devops#aks#kubernetes#kaniko#self-hosted-agent#docker-build

0

This post continues from Part 1 and Part 2: Part 3 focuses on Kaniko — how it fits into the self-hosted AKS + Azure DevOps setup and how to run it as a Kubernetes Job for building and pushing images to Azure Container Registry (ACR).


Why Kaniko on Kubernetes?

In Part 1 we saw that Docker-in-Docker (DinD) relies on the Docker daemon and was tied to Kubernetes’ dockershim, which was removed in 1.24. Running a full Docker daemon inside a pod also requires privileged mode and increases attack surface.

Kaniko builds container images without a Docker daemon. It runs as a normal container, reads a Dockerfile and build context (e.g. from a volume), executes each instruction in user space, and pushes the resulting image to a registry (ACR, ECR, GCR, etc.). That makes it a good fit for Kubernetes-based CI/CD:

  • No privileged pods.
  • No dependency on Docker or containerd socket.
  • Same cluster and namespace as your self-hosted agents; the pipeline job simply creates a Kaniko Job and waits for it to complete.

How Kaniko works (high level)

  1. Inputs: A Dockerfile and a build context (directory tree, usually from a Git checkout).
  2. Execution: Kaniko parses the Dockerfile and runs each instruction (FROM, RUN, COPY, ADD, etc.) inside its own container, writing layers to a temporary location.
  3. Output: It pushes the built image to a registry. No docker build or docker push on the host.
    +------------------+     +------------------+     +------------------+
    | Build context    |     | Kaniko           |     | Registry (ACR)   |
    | (Git clone or    |---->| (K8s Job)        |---->|                  |
    |  workspace)      |     | - Read Dockerfile|     | image:tag        |
    | Dockerfile       |     | - Build layers   |     |                  |
    +------------------+     | - Push image     |     +------------------+
                             +------------------+

Kaniko supports caching (e.g. --cache=true and a cache repo or inline cache) to speed up repeated builds. It also supports multi-stage Dockerfiles.


Kubernetes Job pattern for Kaniko

A typical pattern is a Job with two containers (or an init container + Kaniko):

  1. Init container (or first container): Clones the Git repository (using PAT and branch/commit) into a shared emptyDir volume. This provides the build context and the Dockerfile path.
  2. Kaniko container: Runs executor with:
    • Context: The shared volume (e.g. /workspace).
    • Dockerfile: Path inside the context (e.g. deploy/DockerfileKanikoJava17).
    • Destination: Registry and image name (e.g. miacrprd.azurecr.io/sz-repo:$(GIT_COMMIT)).
    • Authentication: ACR credentials (e.g. from a secret or service account).

The Azure DevOps pipeline job (running on the self-hosted pool) does not run docker itself; it runs kubectl apply (or a script/template that generates the Job YAML and applies it) and then waits for the Job to complete (e.g. kubectl wait --for=condition=complete job/kaniko-xyz).


Job manifest (conceptual)

Your pipeline references a manifest file (e.g. kubernetes-deploy/kaniko-job.yaml from variable KANIKO_MANIFEST_FILE). The template (e.g. KanikoDockerized.yaml) likely parameterises this manifest and applies it. Conceptually the Job looks like this:

apiVersion: batch/v1
kind: Job
metadata:
  name: kaniko-build-$(REGIONAL_CODE)-$(BUILD_ID)   # unique per run/region
  namespace: $(CICD_NAMESPACE)
spec:
  ttlSecondsAfterFinished: 3600
  backoffLimit: 1
  template:
    spec:
      restartPolicy: Never
      initContainers:
        - name: git-clone
          image: alpine/git
          args:
            - clone
            - --single-branch
            - --branch $(BRANCH_NAME)
            - --depth 1
            - https://$(PAT)@dev.azure.com/org/project/_git/$(REPO_NAME)
            - /workspace
          volumeMounts:
            - name: workspace
              mountPath: /workspace
          env:
            - name: PAT
              valueFrom:
                secretKeyRef:
                  name: git-pat
                  key: token
      containers:
        - name: kaniko
          image: gcr.io/kaniko-project/executor:latest
          args:
            - --dockerfile=$(DOCKERFILE_URI)
            - --context=dir:///workspace
            - --destination=$(ACR_URI)/$(REGIONAL_CODE)-repo:$(TAG)
            - --cache=true
            - --cache-repo=$(ACR_URI)/cache/kaniko
          volumeMounts:
            - name: workspace
              mountPath: /workspace
          env:
            - name: DOCKER_CONFIG
              value: /kaniko/.docker
          volumeMounts:
            - name: docker-config
              mountPath: /kaniko/.docker
              readOnly: true
      volumes:
        - name: workspace
          emptyDir: {}
        - name: docker-config
          secret:
            name: acr-registry-secret

Notes:

  • Build context: Init container clones into /workspace; Kaniko uses --context=dir:///workspace. DOCKERFILE_URI is relative to the repo root (e.g. deploy/DockerfileKanikoJava17).
  • Destination: $(ACR_URI)/$(REGIONAL_CODE)-repo:$(TAG) matches the multi-region pipeline (e.g. miacrprd.azurecr.io/sz-repo:abc123).
  • ACR auth: Kaniko uses Docker config JSON; the secret acr-registry-secret typically holds config.json with ACR login (see Kaniko docs for ACR).
  • Caching: --cache=true and --cache-repo push cache layers to ACR so later builds can reuse them.

The pipeline template substitutes REGIONAL_CODE, TAG, ACR_URI, DOCKERFILE_URI, BRANCH_NAME, REPO_NAME, and ensures the Job name and namespace are set. It may also create the PAT and ACR secrets if they are not pre-created.


Build context and Git

The pipeline must supply:

InputSourcePurpose
RepositoryAzure DevOps GitSource code and Dockerfile.
Branch$(BRANCH_NAME)Branch to clone (e.g. main).
Commit$(GIT_COMMIT) or $(TAG)Commit SHA used as image tag.
PATVariable group / secretSo the init container can clone (e.g. USER_PAT_TOKEN or PAT_GIT_TOKEN).

The init container (or equivalent) runs inside the cluster, so it needs network access to Azure DevOps and a PAT with Code (Read) scope. The same Job runs on the self-hosted agent pool, which has kubectl access to the cluster (via service connection or kubeconfig).


Pipeline integration (Azure DevOps)

From the multi-region pipeline, the Build and Containerising stage runs one job per region:

  • Pool: azk8s-agents (self-hosted K8s agents).
  • Template: templates/KanikoDockerized.yaml.
  • Parameters: ACR_URI, REGIONAL_CODE, DOCKERFILE_URI, CONNECT_K8S_SERVICES, CICD_NAMESPACE, KANIKO_MANIFEST_FILE, SYSTEM_ACCESSTOKEN, PAT_GIT_TOKEN, REPO_NAME, BRANCH_NAME, GIT_COMMIT, TAG.

The template typically:

  1. Takes the Kaniko Job manifest (from KANIKO_MANIFEST_FILE).
  2. Substitutes parameters (region, tag, ACR, Dockerfile path, Git branch, repo, etc.).
  3. Applies the Job to the cluster (kubectl apply -f ... in the configured namespace).
  4. Waits for the Job to complete (kubectl wait or a loop checking Job status).
  5. Optionally streams logs (kubectl logs job/...) for diagnostics.

If the Job fails, the pipeline job fails and the stage can retry or fail the run.


Dockerfile and base image

The pipeline variable DOCKERFILE_URI points to the Dockerfile used by Kaniko (e.g. deploy/DockerfileKanikoJava17). Use a Dockerfile that:

  • Is Kaniko-friendly: Avoid instructions that depend on a live Docker daemon (e.g. docker run inside the build). Standard FROM, RUN, COPY, ADD, ENV, EXPOSE, ENTRYPOINT/CMD are fine.
  • Uses a small, stable base if you care about image size and security (e.g. Eclipse Temurin Java 17 Alpine or distroless).

Example (conceptual):

FROM eclipse-temurin:17-jre-alpine
WORKDIR /app
COPY target/*.jar app.jar
EXPOSE 8080
ENTRYPOINT ["java", "-jar", "app.jar"]

For a build stage (compile then copy artifact), use multi-stage; Kaniko supports it. The build context must contain the built artifact (e.g. from a prior Maven step) or the Kaniko Job must run Maven inside the context (e.g. in the init container or a separate build step that places output in the shared volume). In the multi-region pipeline, the Code Base Scan and Run Unit Tests run on Microsoft-hosted agents; the Build and Containerising stage runs on the K8s pool and expects the Kaniko Job to get source from Git. So either the Dockerfile copies from a pre-built path that was committed, or the init container also runs the build (e.g. Maven) before Kaniko runs — that depends on your repo layout and template. The important point is: context and Dockerfile path must match what the Job provides.


Caching

Kaniko can push cache layers to a registry so the next build reuses them:

  • Cache repo: e.g. --cache-repo=miacrprd.azurecr.io/cache/kaniko. Use a dedicated repo or path to avoid polluting app images.
  • Inline cache: Some setups use --cache=true with the same repo as the image; check Kaniko docs for your version.

Caching speeds up repeated builds (e.g. same base image and early layers unchanged). Ensure the ACR secret has push access to the cache repo.


End-to-end flow (one region)

  1. Pipeline triggers on main / master; Prepare Environment Builder scales agents.
  2. Run Unit Tests and Code Base Scan run on hosted or self-hosted agents as configured.
  3. Build and Containerising runs one job per region. For region sz:
    • The job runs on pool azk8s-agents.
    • The template renders the Kaniko Job with REGIONAL_CODE=sz, TAG=$(tag), ACR URI, Dockerfile path, Git repo/branch/commit, and applies it to CICD_NAMESPACE.
    • The Job’s init container clones the repo into /workspace.
    • The Kaniko container builds from dir:///workspace and pushes miacrprd.azurecr.io/sz-repo:$(GIT_COMMIT) (or $(tag)) to ACR.
    • When the Job completes, the pipeline job succeeds and the next stage (e.g. Scan docker image via Trivy) can run.
  4. Deployment and later stages use the built image per region.

Troubleshooting and limitations

  • Job not starting: Check namespace, resource quotas, and that the service account or pod has permission to pull the Kaniko and git images and to use the volume.
  • Clone failure: Verify PAT scope (Code Read), repo URL, branch name, and network from the cluster to Azure DevOps.
  • Kaniko push failure: Check ACR credentials (Docker config secret), registry URL, and that the identity has push permission to the target repo and cache repo.
  • Dockerfile not found: Ensure DOCKERFILE_URI is relative to the repo root and that the init container checked out the correct commit/branch.
  • Kaniko limitations: Some Docker features (e.g. --network=host during build) are not supported; see Kaniko documentation.

Summary

TopicDetail
Why KanikoNo Docker daemon, no privileged pods; builds and pushes from a K8s Job.
Job patternInit container clones Git to a shared volume; Kaniko container builds from that context and pushes to ACR.
PipelineOne Azure DevOps job per region calls a template that parameterises and applies the Kaniko Job, then waits for completion.
VariablesKANIKO_MANIFEST_FILE, DOCKERFILE_URI, ACR_URI, REGIONAL_CODE, TAG/GIT_COMMIT, Git PAT, repo, branch, namespace.
CachingUse --cache=true and --cache-repo in ACR to speed up repeated builds.

Part 3 covered Kaniko in detail. For the full pipeline (all stages and parameters), see the multi-region CI/CD pipeline post.

← All posts

Comments