Ken Muse

What Is ARC Doing?

There’s an elegance in the simplicity of well-written code. This is true of Kubernetes controllers (such as ARC) as well. Troubleshooting issues actually benefits from understanding this. At its root, ARC is surprisingly simple: it’s just scaling a template.

What do I mean by this? Let’s explore!

The Templates

To understand what’s happening, you first need to know that there are three “modes” for the runner scale sets. The mode is configured by setting containerMode.type when deploying a new runner scale set:

  • Docker-in-Docker (dind)
    This mode uses a predefined template with limited configurability. It contains a DinD container and the runner container, enabling it to support Docker image builds and manual image execution.
  • Kubernetes (kubernetes)
    This mode uses a minimally configurable template that relies on using the K8S API to create containers as needed. By default, the runner coordinates the activities for the process, including creating other containers for the job, services, and steps as needed. It doesn’t have a Docker container, so it does not natively support manual Docker commands, including image building. This mode relies on persistent volume claims (PVCs) to share data between the containers and the runner.
  • Default (no mode specified)
    This is the default configuration, containing just the runner container. It may seem like this is the least functional mode because it does not support container jobs. In truth, it’s the most powerful configuration. When no containerMode type is set, ARC will utilize any user-provided template that includes the runner container.

The dind and kubernetes templates are included in the gha-runner-scale-set settings file, values.yaml , providing deeper details about what will be deployed.

The Process

Based on the scale set configuration, ARC is responsible for creating runner pods to process available jobs. That means that it monitors two sources: the GitHub services (to understand the number of jobs needed) and the Kubernetes service (to understand how many runner pods have been created and their status). It then creates new runners (or removes idle runners) to keep the number of active runners between minRunners and the smaller of maxRunners and the number of jobs waiting to be processed.

If more runners are needed and there are less than maxRunners available, ARC creates an Ephemeral Runner custom resource. The creation of this resource then triggers the implementation of the custom resource. This custom resource is a key part of how ARC tracks and manages the runner pods and templates. When a new resource is created, ARC calls the Kubernetes API Server to schedule a new pod based on the template for that runner scale set (along with a configuration file for the runner). It doesn’t manage the runners, assign them jobs, or do anything else at that point in the process. If ARC is running successfully, then it is just listening and requesting. This is why the majority of challenges teams face implementing ARC are related to the templates they are applying or Kubernetes configurations. At this point, everything else relies on core Kubernetes behaviors and scheduling until the runner pod is scheduled. This can include Kubernetes creating new nodes and waiting for suitable resources to become available.

Once the pod is scheduled, the responsibilities shift to the runner. Once its pods has started, the runner connects to the GitHub services to announce its availability and characteristics. At that point, it is eligible to receive a job. Once it receives the job, it executes its steps and reports back. ARC is not involved in these activities. If the runner needs other containers, then it invokes native Docker commands or use hooks to request additional pods using the Kubernetes API Server. This is the nature of the dind and kubernetes templates. They are merely configurations that support those operations. All of the functionality for this is part of the runner itself.

When the runner finishes, then ARC becomes involved. When a job completes, ARC identifies the runner that is finished and validates whether it has finished being unregistered from the GitHub services. ARC then uses the Kubernetes APIs to request a deletion of all of the pods and resources associated with that runner. Once again, ARC relies on core Kubernetes behaviors to handle the processes. Each time this happens, ARC polls to discover when each runner pod (and its secrets) has been successfully cleaned up. Throughout this time, ARC is waiting on core Kubernetes functionality, the deletion and cleanup of resources. This can also take some time, since Kubernetes may need to reset allocated storage and network resources. Once Kubernetes reports that the resources have been released, ARC removes its Ephemeral Runner custom resource. With that resource removed, its now free to create a new Ephemeral Runner and start the process again.

In short, ARC is mostly responsible for sending create and delete requests to Kubernetes. The automation is essentially the same as using kubectl to manually creating and deleting runner pods based on templates. It’s a very basic set of operations, but as I said before — there is a beauty in the simplicity of well-written code. That simplicity helps to make it easier to understand and troubleshoot ARC’s functionality. Understanding the basic flow of the application makes it easier to identify the source of problems when troubleshooting.