Ken Muse

Creating a CodeQL Image for ARC With Python 2


In the previous post, you learned how to create a CodeQL image for ARC. In this post, I’ll show you how to extend that image to include Python 2, which is required for some CodeQL queries.

Adding Python 2

To add support for Python 2, you need to build it from source. I’m going to show you how to do that by adding a new stage to the Dockerfile. This will allow you to build Python 2 in a separate stage and then copy the necessary files into the final image. For this example, I’ll stay entirely within the Dockerfile. Feel free to optimize things! Since this code is unlikely to change any time soon, you could also build it once, cache the results, and then pull the cached binaries into the final stage.

To add Python 2, you can add the following lines to the Dockerfile from the previous post:

 1## Use the same base image as before. Not required, but it means there
 2## are fewer layers to download.
 3FROM ${BASE_IMAGE} AS python2
 4
 5## Set the Python version to install
 6ENV PYTHON_VERSION=2.7.18
 7
 8## And run this image as root without interactive prompts
 9ENV DEBIAN_FRONTEND=noninteractive
10USER root
11
12## You'll use the heredoc again. It saves you from importing a separate 
13## script file as a layer and it keeps you from needing to use lots of
14## line separators and `&&` to chain commands together.
15RUN  <<EOF
16  ## Update the package list and install the required packages for building the code
17  apt-get update
18  apt-get install -qqq -y wget build-essential checkinstall libncursesw5-dev libssl-dev libsqlite3-dev tk-dev libgdbm-dev libc6-dev libbz2-dev libreadline-dev libnss3-dev libffi-dev zlib1g-dev
19
20  ## The base image already has a few things installed that you can remove
21  ## to make the final step in this process a bit easier.
22  rm -rf /usr/local/lib/docker
23  rm -rf /usr/local/lib/python3
24
25  ## Download the source code. The environment variable is used to
26  ## make it easy to change the version later if it becomes necessary.
27  wget -O python2.tgz https://www.python.org/ftp/python/${PYTHON_VERSION}/Python-${PYTHON_VERSION}.tgz
28
29  ## Unpack the source code and move into the directory
30  tar -xvf python2.tgz
31  cd Python-${PYTHON_VERSION}
32
33  ## Configure the makefile. If you don't mind a much longer build time,
34  ## you can add --enable-optimizations to enable optimizations that will
35  ## make the final Python binary run faster.
36  ./configure
37
38  ## Build the code and install the binaries. The `-j` option allows
39  ## the build to use multiple cores, which speeds up the process.
40  ## `all` builds everything, while `install` puts the binaries
41  ## into their final locations for use.
42  make -j`nproc` all install
43
44  ## You also will want pip2 to be available, so download the install
45  ## script for that and run it.
46  wget -O get-pip.py https://bootstrap.pypa.io/pip/2.7/get-pip.py
47  python2 get-pip.py
48
49  ## Create a package that contains the Python 2 binaries and
50  ## libraries. Since those folders were cleaned out earlier,
51  ## the only things that will be included are parts of Python 2
52  ## This preserves the symbolic links, since copying these files
53  ## in a multistage build would cause those to be dereferenced.
54  tar czf /python2.tgz /usr/local/bin/ /usr/local/lib
55EOF

Building the final image

The last stage is to copy the Python 2 binaries into the final image. The COPY command is normally the go-to choice for this. It also allows you to use --link to allow the layer to be built and added independently from the underlying image. Unfortunately, the COPY command isn’t really a good fit for this case. When it copies the files from the underlying image, it dereferences the symbolic links. That means that instead of links, there are lots of duplicate files. This is why the previous stage created a tarball that contains the Python 2 binaries and libraries. The tarball preserves the symbolic links, so when you unpack it, the links are still there.

In the past, you might use the ADD command to download and automatically unpack the Python tarball. Generally, it’s best to avoid that command. It also doesn’t support --from, so it can’t reference the previous stage. So how do you copy and unpack the tarball without using ADD?

The trick is to use a RUN --mount. This is one of the advanced features of Docker’s BuildKit. It allows you to mount a file into the current stage temporarily. This can be used for a variety of purposes, such as caching or safely consuming secrets without building them into the final image. In this case, you’ll use it to mount the tarball from the previous stage and unpack it into the root of the image. This will restore the Python 2 binaries and libraries, including the symbolic links, into the final image. Since the stage you built in the last article (tools) has everything but these binaries, you can use that as the base for this stage.

Since this is the last stage, you can leave off the AS part of the FROM command. This makes it the default build target. That means that you can build the complete image – all of the stages – by running docker build without needing to specify a target.

 1## Use the tools stage as the base for this final image. If you choose,
 2## you can also use the `FROM tools AS final` syntax to create a target
 3## that you can build by name.
 4
 5FROM tools
 6
 7## Make sure that the final image runs as the `runner` user. Just in case
 8USER runner
 9
10## Mount the tarball from the previous stage just during this run
11## command. Use that to unpack the binaries into the root. After this
12## runs, the mounted file is removed and no reference to it is left
13## in the final image. The layer contains just the unpacked files.
14RUN --mount=type=bind,from=python2,source=/python2.tgz,target=/tmp/python2.tgz \
15    sudo tar -xvf /tmp/python2.tgz -C /

Putting it all together

Putting all of this together, you now have a complete Dockerfile that builds a CodeQL image with Python 2 support. The final runner image will have Python 2, Python 3, Node.js, and a tool cache with the CodeQL binaries and Java. In addition, you can use the default workflow that Actions creates for build/publish to create the runner image.

The final code (and an Actions workflow to build the image) is available here: https://github.com/kenmuse/codeql-runner

Next steps

You now have a working image that supports CodeQL (and most of the languages that CodeQL supports). You can use this image in your ARC deployments or with your own runners. Of course, you can add some additional logic to your workflow to add better versioning or to make it only run the workflow if the base image or CodeQL version changes. You could even build the Python 2 binaries and cache the tar.gz, mounting that directly into the final stage (and only building the Python 2 image if the cache is missing or expired). Finally, consider exploring using the actions/setup- Actions to configure and install the language components so those versions are always available to your workflows.

Have fun exploring the possibilities!