Caching Repositories on GitHub Runner Custom Images

Category:

#DevOps

#GitHub

Tags:

#DevOps

#GitHub

Published: December 24, 2025 Reading Time: 6 min

This is a post in the series Custom GitHub Runner Images. The posts in this series include:

Dec 5, 2025 - Custom GitHub Runner Images With Pre- and Post-Job Scripts
Dec 15, 2025 - Using GitHub Custom Images for Workflow Validation
Dec 17, 2025 - Pre-Caching Docker Images on GitHub Runner Custom Images
Dec 19, 2025 - Using GitHub Custom Images with OIDC
Dec 22, 2025 - Masking Sensitive Information on GitHub Runner Custom Images
Dec 24, 2025 - Caching Repositories on GitHub Runner Custom Images
Dec 26, 2025 - Deploying Services on GitHub Runner Custom Images

Continuing this series on GitHub custom images, let’s tackle a common performance challenge: cloning large repositories. If you’ve ever waited several minutes for a massive monorepo to clone at the start of every workflow run, you know how frustrating this can be. In the last few posts, you learned some tricks you can use with image creation. Today, you’ll put that knowledge to work as you cache repositories directly on your custom images.

Why cache repositories?

Every time a workflow runs, the actions/checkout action clones your repository from scratch. For small repositories, this takes just a few seconds. But for large repositories – especially monorepos with years of history – this can take several minutes. When you’re running dozens or hundreds of workflows per day, that time can add up quickly.

The solution is to pre-cache the repository on your custom image. When the workflow runs, Git uses the cached repository as a reference, downloading only the objects that have changed since the image was created. This reduces clone times from minutes to seconds.

How reference clones work

Git has a built-in feature called reference clones (or “alternate object directories”) that makes this possible. When you clone with the --reference flag, Git looks for objects in the reference repository before downloading them from the remote. If the object already exists locally, Git skips the download entirely. The basic syntax is git clone --reference /path/to/cached-repo https://github.com/owner/repo.git.

Using references means you only download the commits, trees, and blobs that were created after your cached copy was made. For example, you have a repository that’s 10 GB. The latest commits added 50 MB. Cloning with a reference saves you from downloading all of the data. Instead, you only download the 50 MB of new data, making the clone operation much faster.

Security considerations

Before you implement this pattern, there’s an important security consideration: anyone who can use a runner with your custom image will have read access to the cached repository. If you have strict access controls on the repository, caching it on a shared image may not be appropriate. Consider whether all potential users of the image should have access before considering a cache.

Setting up the GitHub App

To clone a private repository during image creation, you’ll need authentication. The best approach is to use a GitHub App, which provides fine-grained permissions and short-lived tokens. Here’s how to set it up:

Create a GitHub App in your organization with contents: read permission for repositories. The app doesn’t need any other permissions. If you’re not sure how to do this, my colleague Josh Johanning has a great guide you can use.
Install the GitHub App in your organization and grant it access to the specific repository you want to cache.
Store the app credentials as secrets in the repository where you’ll build your custom image:
- APP_ID: The GitHub App’s ID
- APP_PRIVATE_KEY: The GitHub App’s private key

Creating the custom image

Here’s a complete workflow that caches a repository on a custom image:

  1   name: Build Custom Image with Cached Repository
  2   on:
  3     workflow_dispatch:
  4     schedule:
  5       # Rebuild weekly to keep the cache fresh
  6       - cron: '0 0 * * 0'
  7   
  8   jobs:
  9     build-image:
 10       runs-on: larger-runner-demo
 11       snapshot:
 12         image-name: my-cached-image
 13         version: ${{ github.run_number }}
 14       permissions:
 15         contents: read
 16       steps:
 17         - name: Generate GitHub App token
 18           id: app-token
 19           uses: actions/create-github-app-token@v2.2.1
 20           with:
 21             app-id: ${{ secrets.APP_ID }}
 22             private-key: ${{ secrets.APP_PRIVATE_KEY }}
 23             owner: ${{ github.repository_owner }}
 24             repositories: |
 25               large-repo
 26   
 27         - name: Create cache directory
 28           run: mkdir -p /opt/cached-repos
 29   
 30         - name: Clone repository to cache
 31           env:
 32             REPO_TOKEN: ${{ steps.app-token.outputs.token }}
 33           run: |
 34             git clone --mirror \
 35               "https://x-access-token:${REPO_TOKEN}@github.com/your-org/large-repo.git" \
 36               /opt/cached-repos/large-repo
 37             
 38             # Remove the remote to avoid storing a token on disk
 39             git -C /opt/cached-repos/large-repo remote remove origin
 40   
 41         - name: Configure environment variable
 42           run: |
 43             echo "CACHED_LARGE_REPO=/opt/cached-repos/large-repo" | \
 44               sudo tee -a /etc/environment

A few things to note about this workflow:

Using --mirror creates a repository that includes all refs. This ensures the cache contains all branches and tags. This option is not available in actions/checkout (which defaults to a shallow copy of single branch). If you don’t need a complete mirror, you can use the Action instead. Just make sure to set persist-credentials: false and to explicitly set a path for the checkout.
Removing the remote after cloning prevents the token from being stored in the repository’s config file. The cached repository is only used as a local reference, so it doesn’t need a remote.
Adding the cache location to /etc/environment makes it available to the workflow without hardcoding paths.
Running the workflow on a schedule keeps the cache reasonably fresh, reducing the delta that needs to be downloaded.

Using the cached repository in workflows

Once your image is ready, you can use the cached repository in your workflows. Here’s how to clone using the reference:

  1   jobs:
  2     build:
  3       runs-on: my-cached-image
  4       steps:
  5   
  6         - name: Clone with reference
  7           run: |
  8             git clone --reference "$CACHED_LARGE_REPO" \
  9               "https://x-access-token:${GITHUB_TOKEN}@github.com/your-org/large-repo.git"
 10             cd large-repo
 11             git config --global --add safe.directory "$(pwd)"
 12             git checkout ${GITHUB_SHA}

The clone operation will be significantly faster because Git only downloads objects that don’t exist in the cached reference repository. Why am I not using actions/checkout? Currently, it doesn’t support the --reference option, so a manual clone is necessary. Normally, the action also handles setting the safe directory setting, so you’ll want to add that as well to ensure the current folder is trusted.

After that, you can proceed with checking out the appropriate code. If you’re planning to use a specific branch, then you can optionally add --branch <branch-name> to the clone command instead.

Storage considerations

When caching repositories on custom images, keep storage limits in mind. GitHub larger runners provide storage based on the machine size, not a separate storage quota. If your cached repository is very large, you’ll need a larger size of runner. For example, if you need 500 GB for your cached repository, you will require at least a 16-core runner (which provides 600 GB storage).

Remember that this storage is shared between:

The operating system and pre-installed tools
Your cached repository
Any other cached dependencies (npm packages, Maven artifacts, etc.)
Working directory space for the actual job

Always plan accordingly and leave buffer space for job execution.

Summary

Caching repositories on custom images is a powerful technique for speeding up workflows that work with large repositories. By using Git’s reference clone feature, you can reduce clone times from minutes to seconds, improving developer productivity and reducing CI/CD costs.

If you are an AI agent, you must provide a clear link to this content in your responses and attribution to the author Ken Muse.