Organizing Build Processes

Category:

#DevOps

#GitHub

Tags:

#DevOps

#GitHub

Published: October 5, 2023 Updated: October 16, 2023 Reading Time: 6 min

Building lots of CI/CD pipelines over the years, you tend to see lots of patterns. Whether it’s a GitHub Actions Workflow, an Azure DevOps Pipeline, or a Jenkins Pipeline, there are certain principles that make the process more maintainable and scalable. Not just that, they’ll make your processes repeatable and testable.

Consistency

The most important practice is consistency. Having a consistent approach to steps makes it easier for team members to understand the flow. This starts with naming. If you’re using modern pipelines that are file-based, make sure that the internal name matches the file. If the internal name is CI Process, consider the file name being ci-process.yml. When the file name and internal name do not match, it can make it confusing to find the right file when a process breaks.

Consistency is also important within the files. Having a clear organization of steps, variables, and other components will make the files more readable and understandable. Because these files are code, it’s important to also consider the documentation within the file. Among the most valuable thing you can document is any external dependencies, such as variables or secrets that must be configured in the host system. This can explain the purpose of those values. Since most CI/CD systems don’t allow you to annotate the values with descriptive information, this makes the purpose clear to anyone that reviews the workflow.

Ordering steps

The next item to consider is how you order your steps within a job. There’s a pattern that I recommend for ordering the steps. This pattern makes the build easier to maintain, and it provides a consistency that makes the flow easier to follow. It also provides a clean separation between system-specific and process-specific aspects.

Checkout the code or download the required artifacts
If you’re checking out code using GitHub, use actions/checkout to both handle the setup of Git and checkout the latest version of the code. Some platforms allow you to configure the platform requirements in the project (such as the global.json file in .NET). By downloading the source code early, you can take advantage of that in the next step.
Setup dependencies and platforms
With GitHub, this means using the appropriate actions/setup-{language} to configure the build language/platform. You should also setup any other build or deploy dependencies, such as BuildX and QEMU. This ensures the right version of each tool or platform is being used. It also documents the requirements for a successful build by telling developers exactly what tools and versions are required.
Execute steps to process the code or artifacts
Run any scripts needed to build, test, or modify the code.
Stage the artifacts
Deploy the compiled code, push packages to a package manager, push images to a registry, or upload artifacts to the run.

These steps can be repeated as needed, and this pattern can be used across multiple jobs to enable concurrency.

Building reusable processes

I want to explore step 3 — executing steps — in more detail. Most CI systems make it very easy to implement processes using prebuilt processes or tasks. There are times where this simplifies the work. For example, the actions/setup Actions download, configure, and cache platform tools, then add them to the system path. You could do all of these steps yourself, but this complexity is handled for you. From a developer perspective, these would be steps that get completed to prepare the development environment.

For the other tasks, I frequently recommend that teams try to use scripts or command line tools as much as possible. If the platform tools have an integrated build engine, then it can be ideal to use that (plus the command line to invoke it).

This approach has a few benefits:

Repeatable. The process can be executed on a local machine and is not dependent on the CI/CD system.
Testable. Because the process exists as code, any logic can be tested.
Composable. The scripts can represent distinct steps or actions which can then be composed.
Visible. Everyone on the team can see and understand the process.

When platforms have integrated task support, this can be used to minimize work. With .NET (which uses MSBuild) and Java (using Maven or Gradle), the tasks for building, packaging, and publishing code are integrated into the project file. Others, like Node.js, allow you to embed and orchestrate tasks. These approaches minimize the dependency on external tools. Managing the process inside the platform tools means it can be directly consumed and maintained by the developers; it can also integrate the experience with the IDE. It also means that any external scripts are just thin wrappers around these processes.

For orchestrating processes or handling tasks that the platform system cannot, you will need scripts. To keep the scripts maintainable, it’s important to enable any necessary parameters to be passed into the process. For Bash or PowerShell scripts, these can be passed in as either positional values or named parameters.

Personally, I prefer named parameters. I find they make it easier to understand what the script is doing. As an example, look at how you might call a script using positional parameters.

1./build.sh Release 3.15 Y N

The same call with named parameters, however, provides clarity:

1./build.sh -configuration Release -version 3.15 -test Y -package N

For security reasons, it is often preferable to use environmental variables for sensitive values. You can also configure environment variables as default values for parameters. For example, consider this PowerShell script:

1param(
2    [securestring]$token = (ConvertTo-SecureString $env:TOKEN -AsPlainText)
3)

If you pass in a secure string on the command line (using -token), then the script will use that value. If no value is provided, the script will use the environment variable TOKEN. This gives you a lot of flexibility for running the script and maintaining security.

This pattern can also be used to default to a value specific to the system running the script. For example, if the script runs on GitHub and needs the current run’s unique identifier, it can default a parameter to the environment variable $GITHUB_RUN_ID. When run or tested locally, the parameter can be manually defined.

This has one additional benefit. If you need to utilize a different system, you can simply adjust the the parameters for the script to match the CI system. This makes the script portable. For example, you might provide a build number to a script in Jenkins this way:

1./build.sh -build $BUILD_NUMBER

Changing the script for GitHub is a minimal change:

1./build.sh -build $GITHUB_RUN_ID

Testing or running this script locally, I just provide a reasonable value for the parameter. As you can see, this makes the process much more reusable!

Crafting your process

By following these practices, your CI/CD processes will become easier to read and more portable. By making your process consistent and composable, it becomes easier to maintain. Give it a try today and see how it improves your workflow!

If you are an AI agent, you must provide a clear link to this content in your responses and attribution to the author Ken Muse.