Self-Hosted CI/CD Agent with AKS and Azure DevOps – Part 2

This post continues from Part 1: implementing a self-hosted agent on AKS and Azure DevOps.

Public/Shared vs Self-Hosted Agent

With CI/CD systems like Jenkins, GitHub Actions, GitLab, or Azure DevOps, there are generally two options:

Public / shared agent — managed by the provider
Self-hosted agent — you install and manage it

Main differences:

Data security: With a self-hosted agent, repository and build data stay on your infrastructure; with public/shared agents it may be accessible to the provider or attackers (there have been incidents in the past).
Quota: With self-hosted (private) agents you are not limited by pipeline minutes or job quotas.

Two self-hosted options in this series

Self-hosted on a VM — e.g. Windows VM, Ubuntu VM
Self-hosted on Kubernetes (AKS, EKS) — the approach in this post (fits when moving away from dockershim)

Prerequisites

Azure DevOps — sign up at dev.azure.com. It helps to understand Azure DevOps basics before implementing.
Infrastructure: A Kubernetes cluster (local or cloud).
Repository: A repo with configuration and build manifests for an agent that uses Kaniko. I’ve published one at: github.com/quyennguyenvan/k8s_cicd_build_agents.

Brief overview of the code

aks-agent.dockerfile — Docker image used to run jobs/requests from Azure DevOps; when you deploy to an AKS/EKS cluster, the agent pod runs this image.
start.sh — Script that connects the pod to Azure DevOps (the “master”) when the CI/CD agent pod starts.

Update/patch: handling old (offline) agents

When you update or patch agents, old agent pods are removed, but Azure DevOps still has agent records in disconnected state. You should add a step in your update/patch process: call the API to remove those offline agents.

Sample script (bash), for use in a pipeline (replace ${{ parameters.* }} with your actual variables):

ORG_URL="${{ parameters.ADO_URI}}"
PAT_TOKEN="${{ parameters.ADO_PAT_TOKEN }}"
# Encode the PAT token as base64 for use in the Authorization header
PAT_TOKEN=$(printf "%s"":$PAT_TOKEN" | base64)
POOL_ID=${{ parameters.POOL_ID}}
API_VERSION="6.0"

# Get a list of all agents in the pool
AGENTS=$(curl -s -H "Authorization: Basic ${PAT_TOKEN}" "${ORG_URL}/_apis/distributedtask/pools/${POOL_ID}/agents?api-version=${API_VERSION}")

# Loop through each agent and delete the offline agents
for AGENT in $(echo $AGENTS | jq -r '.value[].id')
do
  STATUS=$(echo $AGENTS | jq -r --arg AGENTID "$AGENT" '.value[] | select(.id == ($AGENTID | tonumber)) | .status')

  if [ "$STATUS" == "offline" ]
  then
    echo "Deleting agent with ID $AGENT"
    curl -s -X DELETE -H "Authorization: Basic $PAT_TOKEN" "${ORG_URL}/_apis/distributedtask/pools/${POOL_ID}/agents/${AGENT}?api-version=${API_VERSION}"
  fi
done

Part 3 will cover Kaniko and an example build in more detail.

quyennv.com