Implementing Docker Layer Caching in GitHub Actions

Category:

Tags:

#Containers

#GitHub

Published: February 9, 2023 Reading Time: 7 min

I’m not always the most patient person. In truth, I hate waiting – especially when it comes to long builds. In Using the Docker Cache, I discussed a few of the options available for caching layers to improve performance. Today, let’s look at how to implement this with GitHub Actions and the GitHub Container Registry (GHCR).

Baseline Action

Let’s start with a simple GitHub Action workflow that will build a Dockerfile. This workflow is very similar to the default one that is provided for you by GitHub for publishing Docker containers. I explore the pattern more in this post. Essentially, it publishes an image with the name {owner}/{repo}:{branch-or-tag} to the GitHub Container Registry associated with {owner}.

The code:

 1name: Docker Build
 2
 3on:
 4  workflow_dispatch:
 5  push:
 6    branches: [ "main" ]
 7    tags: [ 'v*.*.*' ]
 8  pull_request:
 9    branches: [ "main" ]
10
11env:
12  REGISTRY: ghcr.io
13  IMAGE_NAME: ${{ github.repository }}
14
15jobs:
16  build-container:
17    runs-on: ubuntu-latest
18    permissions:
19      contents: read
20      packages: write
21
22    steps:
23      - name: Checkout repository
24        uses: actions/checkout@v3
25
26      - name: Setup Docker buildx
27        uses: docker/setup-buildx-action@v2
28
29      - name: Log into registry ${{ env.REGISTRY }}
30        if: github.event_name != 'pull_request'
31        uses: docker/login-action@v2
32        with:
33          registry: ${{ env.REGISTRY }}
34          username: ${{ github.actor }}
35          password: ${{ secrets.GITHUB_TOKEN }}
36
37      - name: Extract Docker metadata
38        id: meta
39        uses: docker/metadata-action@v4
40        with:
41          images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
42
43      - name: Build and push Docker image
44        id: build-and-push
45        uses: docker/build-push-action@v4
46        with:
47          context: .
48          push: ${{ github.event_name != 'pull_request' }}
49          tags: ${{ steps.meta.outputs.tags }}
50          labels: ${{ steps.meta.outputs.labels }}

Using a Cache

To select and use a cache type, we modify the step for building and pushing the image. We just need to add two additional inputs, cache-to and cache-from. The configurations for these is a set of comma-separated name/value pairs.

The basic structure:

 1      - name: Build and push Docker image
 2        id: build-and-push
 3        uses: docker/build-push-action@v4
 4        with:
 5          context: .
 6          push: ${{ github.event_name != 'pull_request' }}
 7          tags: ${{ steps.meta.outputs.tags }}
 8          labels: ${{ steps.meta.outputs.labels }}
 9          cache-from: 
10          cache-to:

GHA

The first cache type we’ll explore is GHA (GitHub Actions cache). This saves the metadata and blobs for the cache to the GitHub Actions cache service. The cache is limited to 10GB per repo, so it’s not a good fit for large images or repos that need to cache a large number of layers. The caches are scoped by branch, with the default branch cache being available to every branch. The details about this cache are documented here.

It has two modes for cache-to:

min: Only export layers for the resulting image (default)
max: Export all layers, including the intermediate steps

An additional parameter, scope is also available for cache-to. This provides a scoping name for the cache (default: buildkit). This can be used to avoid potential cache collisions.

The configured step:

 1      - name: Build and push Docker image
 2        id: build-and-push
 3        uses: docker/build-push-action@v4
 4        with:
 5          context: .
 6          push: ${{ github.event_name != 'pull_request' }}
 7          tags: ${{ steps.meta.outputs.tags }}
 8          labels: ${{ steps.meta.outputs.labels }}
 9          cache-from: type=gha
10          cache-to: type=gha,mode=max

After running a build, the cache contents will be pushed to the Actions cache service. For example:

Sample cache contents

Inline

The next level of enhancement is the inline cache exporter. This one embeds the cache directly into the images themselves, enabling the registry to store the cache data. This has a few limitations:

It adds the cache data to your registry along with the images
It only supports min caching mode, so it only exports layers for the resulting image
The cache importer (cache-from) must be type=registry,ref=IMAGE_NAME

The registry ref is worth diving into a bit further.

What happens if we use type=registry,ref=${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}? If no specific version tag is provided for the ref, the importer will use latest. So, that ref is equivalent to type=registry,ref=${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:latest. This is usually not what we want!

We often want to reference the caches associated with the same tag. An easy solution is to use the first tag returned from the step that gathers the Docker metadata. Because we gave that step an identifier (id: meta), we can directly reference the JSON output using the expression ${{ fromJSON(steps.meta.outputs.json).tags[0] }}. This would be written like this:

 1      - name: Build and push Docker image
 2        id: build-and-push
 3        uses: docker/build-push-action@v4
 4        with:
 5          context: .
 6          push: ${{ github.event_name != 'pull_request' }}
 7          tags: ${{ steps.meta.outputs.tags }}
 8          labels: ${{ steps.meta.outputs.labels }}
 9          cache-from: type=registry,ref=${{ fromJSON(steps.meta.outputs.json).tags[0] }}
10          cache-to: inline

There may be multiple tags. As an example, a scheduled trigger will a nightly tag in addition to the branch-specific tag. This results in:

${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:nightly
${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:main

You can apply additional logic if you want to target a specific tag in the list. A better alternative is to understand that cache-from is actually defined as a delimited list. Each line represents a unique cache importer. You could use a run step to format the tags for use with cache-from and output the values. The command line app jq can be used to process the JSON data and convert it to an array of line-delimited strings.

This approach would be implemented:

 1      - name: Format tags as registry refs
 2        id: registry_refs
 3        env:
 4          TAGS: ${{ steps.meta.outputs.json }}
 5        run: |
 6          echo tags=$(echo $TAGS | jq '.tags[] | "type=registry,ref=" + . | @text') >> $GITHUB_OUTPUT          
 7
 8      - name: Build and push Docker image
 9        id: build-and-push
10        uses: docker/build-push-action@v4
11        with:
12          context: .
13          push: ${{ github.event_name != 'pull_request' }}
14          tags: ${{ steps.meta.outputs.tags }}
15          labels: ${{ steps.meta.outputs.labels }}
16          cache-from: ${{ steps.registry_refs.outputs.tags }}
17          cache-to: inline

Registry

This is the last one we’ll examine today. This gives the greatest control, including:

Separate the cache data from the main image data
Change the compression algorithm to gzip, estargz, or zstd (’compression=zstd)
Set the compression level to a value from 0 to 22 (compressionlevel=11)
Use OCI media types in the manifest (oci-mediatypes=true)
Supports both min and max modes

Just like with inline, a ref must be provided. In this case, it is required for both the cache-to and cache-from. All of the details about creating and handling the ref targets apply here. There is one major difference. Because the cache is not included inline, a dedicated tag can be used. This can be a fixed image name (such as buildcache) or it can be a dynamic name (such as cache-main or cache-nightly).

Until recently, BuildKit currently only supported a single cache exporter (cache-to) and did not support multiple values. That has since changed, although there are still some open issues. The Action supports a list for cache-to to allow you to use the features. In the past, it was common to use a single cache or a cache for the build event type (branch, nightly, pr, etc.).

 1      - name: Build and push Docker image
 2        id: build-and-push
 3        uses: docker/build-push-action@v4
 4        with:
 5          context: .
 6          push: ${{ github.event_name != 'pull_request' }}
 7          tags: ${{ steps.meta.outputs.tags }}
 8          labels: ${{ steps.meta.outputs.labels }}
 9          cache-from: type=registry,ref=${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:cache
10          cache-to: type=registry,ref=${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:cache,mode=max

If you’re using multiple branches, you can define those multiple ways. You can use code (similar to what we did for above) to create dynamic values as outputs from a step. You can also use GitHub context variables to get the event type, branch name, or other values.

Using a hard-coded list with some dynamic variables might look like this:

 1      - name: Build and push Docker image
 2        id: build-and-push
 3        uses: docker/build-push-action@v4
 4        with:
 5          context: .
 6          push: ${{ github.event_name != 'pull_request' }}
 7          tags: ${{ steps.meta.outputs.tags }}
 8          labels: ${{ steps.meta.outputs.labels }}
 9           cache-from: |
10            type=registry,ref=${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:cache
11            type=registry,ref=${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:cache-${{ github.event_name }}            
12          cache-to: |
13            type=registry,ref=${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:cache,mode=max
14            type=registry,ref=${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:cache-${{ github.event_name }},mode=max

That covers the main options. I’d encourage you to explore these and see which approaches offer you the best performance gains.

Happy DevOp’ing!