Ken Muse

The Many SHAs of a GitHub Pull Request

Many people don’t realize it, but pull requests actually have multiple “identities” when an Action workflow runs. There are actually multiple SHAs that represent different parts of the lifecycle during a continuous integration build. This can be confusing for teams that are trying to implement advanced processes, and it often lead to mistakes. In some cases, it leads to a misunderstanding – thinking that the build is testing the branch itself instead of a simulated merge to the target branch.

Another common mistake relates to using the APIs to implement Checks. GitHub relates Statuses and Checks directly to a commit, not to the pull request. Consequently, associating those to the wrong SHA will lead to the check/status not being applied or updated correctly. Depending on your branch protections, that can prevent your pull request from merging.

To understand why there are multiple SHAs being provided, you must first understand the pull request process.

The pull request process

When you create a branch, it references a SHA identifier – the most recent commit on that branch – as the HEAD. When you create a pull request, you merge this HEAD back to a branch, the Base. Because that also represents a point in time, the current commit for that Base branch is recorded along with the branch name. During the pull request process, the branch being merged is referred to as the pull_request.head, while the target branch is the pull_request.base.

Visualization of Base/Head relationship

The Base branch can continue to change. It may have other pull requests being merged into it, or it may be directly updated. To ensure that the current branch is compatible and does not have a merge conflict, GitHub can create a test merge for validating “mergeability”. This can be periodically recreated in response to changes (or opening the PR). The attribute mergeable is set to true if there are no conflicts.

Visualization of test commits

To support CI builds as part of a pull request, GitHub creates a merge branch. This branch begins with the latest commit from the Base branch. It then merges the HEAD to this branch. A git reference (ref) pointing to that commit is created, refs/pull/{PR}/merge. This process is repeated whenever a new commit is pushed to the pull request branch (as part of the pull_request.synchronized event).

When the pull_request event is processed by the Actions workflow, most of these values are included in the event payload. This allows the workflow to access all of these states. For example, the build itself relies on the commit associated with the merge ref’s commit. At the same time, calls to the Status or Checks API need access to the SHA associated with the HEAD of the branch being merged.

A SHA for every occasion

The pull_request event exposes the values into the workflow as part of the github context.

For downloading the merged source code to support build/test, these values are used:

  • github.sha
    This is the SHA for a temporary commit created for validating the pull request. The commit represents the results of a point-in-time merge of the pull_request.head to the pull_request.base. This value is also exposed through the environment variable GITHUB_SHA. By default, actions/checkout will pull this commit into the local workspace.
  • github.ref
    This is the pull request’s merge branch, pointing to the current github.sha. It will be a string in the form refs/pull/{PR}/merge. This is also exposed through the environment variable GITHUB_REF.
  • github.ref_name
    A shortened version of the reference in the format {PR}/merge

To provide access to the sources of the merge, github.event.pull_request exposes details about the Base and HEAD:

  • github.event.pull_request.base.ref
    The name of the Base branch being targeted by the pull request.
  • github.event.pull_request.base.sha
    The SHA that represents the HEAD for the Base branch at the time the pull request was created.
  • github.event.pull_request.head.ref
    The name of the branch being merged into the Base.
  • github.event.pull_request.head.sha
    The SHA of the HEAD commit of the branch that is being merged into the Base.

There is even a value that has:

  • github.merge_commit_sha
    This is a temporary commit that is created behind the scenes for the test merge that validated no conflicts exist with the base branch. It is not committed to the repository. After the PR is merged, this value instead represents the SHA of the merge commit as detailed in the REST API docs. It was briefly deprecated due to the confusion around these changing values.

During a synchronized event, there are two more values to allow you to analyze changes to the branch being merged. This event is raised in response to the pull_request.head branch changing with the push of a new commit; it does not occur in response to changes to the pull_request.base branch.

  • github.event.after
    The new commit SHA that represents the HEAD of the branch being merged.
  • github.event.before
    The previous commit SHA for the HEAD of the branch being merged.

This illustrates the relationships:

Visualization of SHAs

The Case of the missing SHA

You may have noticed that there’s an important SHA that is missing. In the diagrams above, you’ll notice there is nothing in the event that references the latest commit in Base. To find that, we’re going to need to go back to the basics in Git. If you run git log --all --pretty=oneline --decorate=short in the Action, you will notice that the relationship is clearly described and that the details are all part of the history. You’ll get something like this:

11b4eae30aab1738ebac217c65dd0cb428613c5e9 (HEAD, refs/remotes/pull/1/merge) Merge edf54c35c9ac4069649680e076a9c7d825edb34f into 9f16e0f67ef1943d536e54ca57608a92d160f09f
2edf54c35c9ac4069649680e076a9c7d825edb34f (refs/remotes/origin/Feature) Another change to Feature (F3)
36336d94196cec5c6a59330ae19caa3cb408a70da Correct code on Feature (F2)
49f16e0f67ef1943d536e54ca57608a92d160f09f (refs/remotes/origin/main) Update Main (M2)
5a92fb0992fdd35f09778849f1fc7c60dee9eea75 Initial commit

In this example:

  • 1b4eae is the github.sha
  • edf54c is the github.event.pull_request.head.sha
  • a92fb0 is the github.event.pull_request.base.sha
  • 9f16e0 is the current version of main (which is mentioned in message from the github.sha commit)

This means we can extract what we need directly from the cloned repo.

Extracting the Base SHA

One option for retrieving the value is to use git rev-parse:

  • git rev-parse ${{ github.sha }}^ - retrieve the first parent (typically matches M2)
  • git rev-parse ${{ github.sha }}^2 - retrieve the second parent (typically matches github.event.pull_request.head.sha)
  • git rev-parse ${{ github.sha }}^@ - retrieve all parents, delimited with a new line

Combined with a grep command in Bash, you can automatically filter out the head.sha:

1export LATEST_BASE_COMMIT=git rev-parse ${{ github.sha }}^@ | grep -Fvx ${{ github.event.pull_request.head.sha }}

The grep options being used:

  • F Treat the PATTERN string as a fixed value instead of a regular expression
  • v Inverts the match (find lines NOT matching the github.event.pull_request.head.sha)
  • x Match an entire line

If you prefer PowerShell, you can use the following to extract the value:

1$latest = git rev-parse '${{ github.sha }}^@' | Where-Object { $_ -notmatch '${{ github.event.pull_request.head.sha }}' }

Another options is to use git rev-list --parents -1 ${{ github.sha }}. This returns a space-delimited list with three values: the github.sha, the latest commit on github.event.base.ref, and github.event.pull_request.head.sha. For example:

11b4eae30aab1738ebac217c65dd0cb428613c5e9 9f16e0f67ef1943d536e54ca57608a92d160f09f edf54c35c9ac4069649680e076a9c7d825edb34f

There’s more where that came from

I should mention that there is one set of SHAs that is not preserved throughout the process. With each update to the pull request, an event is raised for the synchronized activity. The current Base and branch Head are processed into a test merge for the event. While the before/after attributes track changes to the branch being merged, there are no attributes that specifically track changes to the Base branch commits. In the example above, adding M2 would not trigger a new pull request workflow run. When F3 was added, the resulting run captured the update from F2 to F3, but not the shift in the Base from M1 to M2.

You could logically deduce the value using the Git command line, but it requires some effort. Because this offers little value, it’s generally not recommended. In most cases, you’re really interested in what changed on your branch between the two most recent commits. That detail is easier to capture:

1git diff ${{ github.event.pull_request.before }} ${{github.event.after }}

Having it all

As you can see, there are a lot of commits and SHAs that are part of the pull request process. Hopefully this has made it a bit easier to understand how and when to use each value.

Happy DevOp’ing!