Ken Muse

Building GitHub Actions Runner Images With A Tool Cache

This week, I wanted to explore one of the most overloooked aspects of building a proper GitHub Actions Runner image - caching. The base Runner image is very small. It contains a very small subset of the files that it will need for doing typical tasks. If you’ve build an image before, this is not unusual. In fact, you’re probably used to needing to add files to the image to make everything work correctly. Some of those need to be available globally, such as tar. Others, such as language SDKs, should be installed dynamically using tools.

In Actions, tools are the files that are configured by tasks such as actions/setup-java and actions/setup-python. In fact, if it starts with actions/setup-, it’s a tool. What makes tools special is how they work. Essentially, running a step with that Action will do a few things:

  1. Look to see if that version of the tool exists in the directory pointed to by the environment variable RUNNER_TOOL_CACHE (or the _tools directory in the Runner’s _work folder).
  2. If the version doesn’t exist, find the URL for the tool and download it. Unpack it into the tool cache.
  3. Add the path to the tool’s binary to the $GITHUB_PATH, configure caching, and update any required settings.

This process makes it easy to switch between versions of the tool on the same runner. It also avoids the risks of using the ambient version of a programming language. I’ve seen far too many teams relying on node or dotnet that were shocked to find their builds breaking because the global version of the tool changed. Using a tool avoids this problem!

There’s a downside to this approach, however. Each time a version of the tool is needed, multiple requests are made to resolve the version and then download the package. If you run 100 builds, that’s 100 sets of downloads. Not very efficient! The naive approach is to directly install the tool on the image. To avoid the risks of a global tool, an image is built for each variant. Unfortunately, that means that you’re building and maintaining multiple images.

A better approach is often to pre-cache the tools. While this can create a slightly larger image, it provides flexibility while minimizing download costs. In truth, the total image size is often the same or smaller than having multiple images with different versions of the tools installed globally. Actions will use the cached copy if it’s available, avoiding the download step.

So how does it work?

The Dockerfile

For this cache, the Dockerfile can be quite basic. We can start with a version of actions-runner (or the latest copy), then copy a local copy of the tools to the image:

2COPY --link --chown=1001:123 tools /home/runner/_work/_tool

The --link command tells Docker that the layer isn’t actually dependent on details from a previous layer to build the final image. The --chown command sets the ownership to a specific user (runner) and group (docker), matching the permissions on the image. Why not use --chown=runner:docker? Resolving those names relies on details from the base image. This leads to an error at build time unless you remove --link.

Next up … populate the cache!

Building a tool cache

You could examine each tool, understand how the files are structured for that tool, and recreate it yourself. I’m not a fan of doing work that isn’t necessary, so let’s look at something easier. If you’ve ever worked with GitHub Enterprise Server, you may have come across this trick for building the tool cache folder using an Actions workflow. We’ll extend the process just a bit to be image-friendly.

Essentially, you need to create a job that does a few tasks:

  1. Replace the existing cache folder to eliminate cache content
  2. Run each actions/setup- that you want to cache
  3. Archive the tool cache folder
  4. Upload the archive as an artifact

Make sure to do this on a runner that’s compatible with your image architecture so that you get the right binaries. There are a few reasons to eliminate the existing cache:

  • The initial cache is huge, typically over 2GB compressed (and 8GB uncompressed)
  • You want to minimize the Docker image
  • You want to control what’s being cached and included

The job definition looks like this:

 2 # Your triggers here
 5  create-tool-cache:
 6    runs-on: ubuntu-latest
 7    steps:
 9      ## Remove any existing cached content
10      - name: Clear any existing tool cache
11        run: |
12          mv "${{ runner.tool_cache }}" "${{ runner.tool_cache }}.old"
13          mkdir -p "${{ runner.tool_cache }}"          
15      ## Run the setup tasks to download and cache the required tools
16      - name: Setup Node 16
17        uses: actions/setup-node@v4
18        with:
19          node-version: 16.x
20      - name: Setup Node 18
21        uses: actions/setup-node@v4
22        with:
23          node-version: 18.x
24      - name: Setup Java
25        uses: actions/setup-java@v4
26        with:
27          distribution: 'temurin'
28          java-version: '21'
30      ## Compress the tool cache folder for faster upload
31      - name: Archive tool cache
32        working-directory: ${{ runner.tool_cache }}
33        run: |
34          tar -czf tool_cache.tar.gz *          
36      ## Upload the archive as an artifact
37      - name: Upload tool cache artifact
38        uses: actions/upload-artifact@v4
39        with:
40          name: tools
41          retention-days: 1
42          path: ${{runner.tool_cache}}/tool_cache.tar.gz

Why do I set retention-days to 1? If everything works in the next steps, then I don’t have a need to retain that artifact. I could delete it, but I’m giving myself 1 day in case I want to review the contents. If you need artifacts for multiple architectures, simply use a matrix to run jobs on the required hardware.

Build the image

At this point, we can now use those files to build an image. For that, I’ll use a seaprate job. Since I was modifying the runner state a bit in the last job, I like to start fresh in a new environment to build the image. This also ensures that any pre-job work that might be done by the setup jobs doesn’t affect my image creation. To be clear, you could run all of this on a single runner and skip the artifact upload. To create the image, we need to do a few things:

  1. Checkout the repo (to get the Dockerfile)
  2. Download the tools artifact (so we have the tools)
  3. Unpack the tools where the Dockerfile expects to find them (in the tools folder in the workspace)
  4. Build the image with the Dockerfile we created earlier, copying the files into the image

The job would look something likes this:

 2    runs-on: ubuntu-latest
 4    ## We need the tools archive to have been created
 5    needs: create-tool-cache
 6    env:
 7      # Setup some variables for naming the image automatically
 8      REGISTRY:
 9      IMAGE_NAME: ${{ github.repository }}
11    steps:
13      ## Checkout the repo to get the Dockerfile 
14      - name: Checkout repository
15        uses: actions/checkout@v4
17      ## Download the tools artifact created in the last job
18      - name: Download artifacts
19        uses: actions/download-artifact@v4
20        with:
21          name: tools
22          path: ${{github.workspace}}/tools
24      ## Expand the tools into the expected folder
25      - name: Unpack tools
26        run: |
27          tar -xzf ${{github.workspace}}/tools/tool_cache.tar.gz -C ${{github.workspace}}/tools/
28          rm ${{github.workspace}}/tools/tool_cache.tar.gz          
30      ## Build the image
32      ## Set up BuildKit Docker container builder
33      - name: Set up Docker Buildx
34        uses: docker/setup-buildx-action@v3
36      ## Automatically create metadata for the image
37      - name: Extract Docker metadata
38        id: meta
39        uses: docker/metadata-action@v5
40        with:
41          images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
43      ## Build the image
44      - name: Build and push Docker image
45        id: build
46        uses: docker/build-push-action@v5
47        with:
48          context: .
49          push: false
50          tags: ${{ steps.meta.outputs.tags }}
51          labels: ${{ steps.meta.outputs.labels }}

You can modify this to automatically push your image to make it available for later.

And there you have it – the automation for including the tools cache in your image. It’s worth mentioning that by layering the image this way, the base image will only be updated when a new runner version is published. This can help maximize layer caching ( more on that here).

Happy DevOp’ing!