Ken Muse

Using the Docker Cache


Did you know that Docker has multiple built-in caches for making if faster and easier for you to build images? If not, you’ve been missing out on some performance possibilities! Modern docker builds rely on Buildkit, a feature of Docker that enables high-performance builds. This includes support for storing and consuming layers by using caches.

Caches are different from the pull-through functionality commonly discussed in other blogs. Instead of needing you to setup your own Docker registry, they enable you to read and write layer details in a way that enables faster builds. If you weren’t aware of this – no worries! Surprisingly, most people aren’t. And even fewer people realize these caches work with other tools, such as dev containers and GitHub Actions. Part of the reason is that it’s tough finding a good explanation of these caches or how to use them. Hopefully today we can resolve part of that problem.

The easiest way to use the cache is to use docker buildx build with the --cache-from and --cache-to parameters. There are multiple cache types, so let’s break this down a bit.

Local

This is the easiest cache mode. It stores the resulting layer information and metadata in the local file system. The content is stored using the OCI Image Spec, organizing the metadata and blobs into folders. If matching metadata is found for a layer, Docker pulls the layer from the cache. At the end of a build, the cache data can be written back to the filesystem. The format is typically docker buildx build --cache-from=type=local,src=/path/to/cache --cache-to=type=local,dest=/path/to/cache.

Inline

This stores cache data beside the main image in the registry. Essentially, it stores the metadata needed to recreate the local build cache along with the image details. The layers themselves are not modified. This metadata is a JSON document which provides the resolved layer and a hash for the files. It uses min mode, so it only exports layers for the resulting image as opposed to tracking all of the intermediate steps. This means it is usually faster, so it can be helpful if you don’t have a lot of bandwidth. The cache setting requires us to pull from the registry, but write an inline cache. The format is docker buildx build --cache-from=type=registry,ref=MYIMAGE --cache-to=type=inline

Registry

This is a more robust storage solution. Instead of the cache being co-located with the image data, we have more control. We can separate the cache into its own registry. This keeps the main registry clean and provides an easy way to manage or invalidate the cache. As an added benefit, we can use both min and max modes. The max mode stores the intermediate layers, enabling more cache hits. Because it needs to know the image name and mode, it requires a few more details. In addition, you generally want to --push the final image to the repository. If you want to use a separate image (or tag) for the cache, you specify that as part of the ref. Otherwise, the cache is stored beside the image (like inline!). The format is docker buildx build --cache-from=type=registry,ref=MYIMAGE:cache --cache-to=registry,ref=MYIMAGE:cache,mode=max (assuming you call the cache image MYIMAGE:cache).

You have to be a bit careful with this one. If you’re working with pull requests, you may want to NOT to cache-to the registry to avoid overwriting the cache and creating a “miss” later.

Experimental

These are some newer caches that can be used, but may have some changes in the future. These are optimized to provide improved performance in specific scenarios. These support both min and max mode.

GitHub Actions

By setting the type=gha, the GitHub Actions cache is used to store the metadata and layers. It provides quick access to the layer data, but it is limited to 10GB of data which can be deleted after 7 days of inactivity). By using the Actions cache, the OCI data can be stored as part of the Action and made available on the local file system when it is needed. Because this relies on GitHub Actions, it should never be used outside of a GitHub Actions workflow. The easiest way to take advantage of this cache is to use the Docker-provided docker/build-push-action. More details on the cache support for that action are available in the docs.

Be aware that the cache is associated with the current branch, making it possible to create more complex caching strategies.

S3 Cache

Wondering whether you can just store this in AWS? Look no further! By setting type=s3,region=$AWS_REGION,bucket=$AWS_BUCKET,name=MYIMAGE, you can preserve the build metadata with Amazon. This is most helpful when the build process is happening on Amazon’s services, but can also be a valuable way to distribute a build-cache for developers. Under the covers, this uses the AWS Go SDK. It needs the IAM instance provide to provide the daemon access. This means providing the access_key_id, secret_access_key, and session_token as values, through environment variables, or in a configuration file.

Azure Cache

Last but not least, Azure Blob Storage is now supported (experimentally, at least). Like Amazon, it uses the Go SDK. The format is type=azblob,name=MYIMAGE. This requires the account_url (pointing to the core.windows.net domain for the blob) or the environment variable BUILDKIT_AZURE_STORAGE_ACCOUNT_URL. The container name defaults to buildkit-cache, but can be overridden using container or the environment variable BUILDKIT_AZURE_STORAGE_CONTAINER.

Summary

There are lots of options and numerous ways to use the cache to improve performance. You’ll have to experiment a bit to see which one(s) work best for your purposes. You can read more in the documentation:

Happy DevOp’ing!