First impressions and learnings on the new BuildKit's supply chain security features

First impressions and learnings on the new BuildKit's supply chain security features
Photo by FLY:D / Unsplash

If you have come across recently the terms "SBOM" and "SLSA Provenance attestation" and you don't know what those are, or what their purpose is, welcome - you're not alone!

The recent release of BuildKit v0.11.0 has introduced new features that will help you secure the Supply Chain of container images. In this post, I want to help you understand the very basics of these concepts and how to get started with some practical examples. Also, I want to share my journey alongside the lessons I learned.

Create a BuildKit builder

First of all, you need a BuildKit builder instance whose version is >=v0.11.0 to be able to generate SBOM and SLSA Provenance attestations. Older versions of BuildKit won't support the generation of these artifacts.

You can create one as follows:

docker buildx create \
	--name=buildkit-container \
    --driver=docker-container \
    --driver-opt image=moby/buildkit:v0.11.2 \
    --bootstrap \
    --use
Command to create a BuildKit builder 

The --driver=docker-container flag allows the creation of a managed and customizable BuildKit environment as a Docker container. Also, the container is booted after creation by using the --bootstrap flag. It supports cache persistence, as it stores all the BuildKit state and related cache into a dedicated Docker volume.

Software Bill of Materials (SBOMs)

An SBOM is a comprehensive list of all the components, libraries, and dependencies that make up a software product. It includes information such as version numbers and licenses of each component.

Image from https://scribesecurity.com/sbom/#definition-of-software-bill-of-materials

Generating an SBOM for a Docker image

Generating an SBOM as part of building your Docker image is pretty straightforward by using the --sbom=true flag.

docker buildx build --sbom=true -t felipecruz/buildkit-ssc-features:sbom  .  

...
 => [linux/amd64] generating sbom using docker.io/docker/buildkit-syft-scanner:stable-1                           0.7s
...

Visualizing the SBOM

Now that the SBOM is generated, the next step is to see the content. Initially, I thought that by inspecting the local image I could find more information about the SBOM generated.

However, if you run docker image inspect felipecruz/buildkit-ssc-features:sbom you won't find the attestation as part of the image configuration.

👨‍🏫 The first lesson learned

Attestations are stored as manifest objects in the image index, similar in style to OCI artifacts. This requires you to push the image to a registry, or export it to a local directory as we will see later.

Note that the in-toto attestation contains a "https://spdx.dev/Document" predicate, signifying that it is defining an SBOM for the image.

We can use docker buildx imagetools inspect to see the manifest structure or https://explore.ggcr.dev/?image=felipecruz/buildkit-ssc-features:sbom to explore the contents interactively:

docker buildx imagetools inspect felipecruz/buildkit-ssc-features:sbom --raw

{
	"schemaVersion": 2,
	"mediaType": "application/vnd.oci.image.index.v1+json",
	"manifests": [
		...
		{
			"mediaType": "application/vnd.oci.image.manifest.v1+json",
			"size": 839,
			"digest": "sha256:2b971e062fac0be7592645739bd9255f7e2f028477b9b17a228fa7caa9548fdb",
			"platform": {
				"architecture": "unknown",
				"os": "unknown"
			},
			"annotations": {
				"vnd.docker.reference.digest": "sha256:799541c25e40891f6ebffca77caf9d7e8e141082cab03927932fe1ba5c6a584c",
				"vnd.docker.reference.type": "attestation-manifest"
			}
		},
		...
	]
}
Cropped raw output from the image in the registry

Because felipecruz/buildkit-ssc-features:sbom is a multi-platform image that targets linux/amd64 and linux/arm64, the output of inspecting the image in the registry will contain one SBOM per platform:

docker buildx imagetools inspect felipecruz/buildkit-ssc-features:sbom --format '{{ json .SBOM }}'

{
  "linux/amd64": {
    "SPDX": {
      "SPDXID": "SPDXRef-DOCUMENT",
      ...
  },
  "linux/arm64": {
    "SPDX": {
      "SPDXID": "SPDXRef-DOCUMENT",
      ...
  }
}

Therefore, to check the SBOM of a particular platform like linux/amd64, you can use the following --format expression:

docker buildx imagetools inspect felipecruz/buildkit-ssc-features:sbom --format '{{ json (index .SBOM "linux/amd64") }}'

{
  "SPDX": {
    "SPDXID": "SPDXRef-DOCUMENT",
    "creationInfo": {
...

Alternatively, if you want to see the SBOM attestation without having to push the image to a registry, you can output the contents of the image to a local directory with the -o flag:

docker buildx build --sbom=true -o ./image .

cat ./image/sbom.spdx.json 

{
  "_type": "https://in-toto.io/Statement/v0.1",
  "predicateType": "https://spdx.dev/Document",
  "subject": [
    {
      "name": "bin/busybox",
      "digest": {
        "sha256": "36d96947f81bee3a5e1d436a333a52209f051bb3556028352d4273a748e2d136"
      }
    },
    ...
}

Find vulnerabilities in the SBOM

Finding vulnerabilities is out of the scope of BuildKit, so you need to use an external tool such as grype. This OSS tool allows us to find vulnerabilities in Docker images, SBOMs, etc.

In the beginning, I was confused as to why grype would fail to detect vulnerabilities in the SBOM generated by BuildKit.

cat ./image/sbom.spdx.json | grype -vv

...
[0000] DEBUG format syft-6-json returned err: could not extract syft schema form-lib=syft
[0000] DEBUG format cyclonedx-1-xml returned err: EOF form-lib=syft
[0000] DEBUG format cyclonedx-1-json returned err: not a valid CycloneDX document form-lib=syft
[0000] DEBUG format spdx-2-tag-value returned err: unable to decode spdx-tag-value: no colon found in '{' form-lib=syft
...
No vulnerabilities found

The errors above point out that grype cannot parse the SBOM generated by BuildKit. Investigating the source code of the SBOM generator that uses BuildKit, I came across that the JSON-encoded SPDX document - the actual SBOM - is saved in the predicate field.

👨‍🏫 The second lesson learned

My assumption was to believe the sbom.spdx.json was an actual SBOM that follows a JSON-encoded SPDX format.

However, the sbom.spdx.json generated by BuildKit is in fact an in-toto attestation.  The predicate property of the attestation contains a JSON-encoded SPDX document (the SBOM) whereas the subject contains whatever software artifacts are to be associated with this SPDX document.

Finally, passing just the predicate part of the attestation to grype works as expected:

cat ./image/sbom.spdx.json | jq .predicate | grype --only-fixed
NAME                   INSTALLED                 FIXED-IN  TYPE            VULNERABILITY   SEVERITY                
bash                   5.1-6ubuntu1                        deb             CVE-2022-3715   Low (suppressed)         
coreutils              8.32-4.1ubuntu1                     deb             CVE-2016-2781   Low (suppressed)         
gpgv                   2.2.27-3ubuntu2.1                   deb             CVE-2022-3219   Low (suppressed)         
libc-bin               2.35-0ubuntu3.1                     deb             CVE-2016-20013  Negligible (suppressed) 
...   
List of vulnerabilities detected in the SBOM by Grype

Multi-stage images

When using BuildKit to generate SBOMs, I’d have expected that the SBOM will take into account all the dependencies for all the intermediate stages that depend on the final stage.

👨‍🏫 The third lesson learned

Apparently, that's not the default case because it would be computationally more expensive and would take a longer time to produce the SBOM. You need to provide the following env. var to enable such behavior: BUILDKIT_SBOM_SCAN_STAGE=true

For instance, you may have a build stage that uses curl to download a binary and a final stage where you copy that binary. It wouldn't be accurate to track the curl dependency as part of the final stage when it's not used in that final stage.

SLSA Provenance Attestation

In the beginning, we saw that an SBOM is like a list of all the "ingredients" used in a recipe, along with the details like the quantities, versions, and sources of each ingredient.

On the other hand, SLSAs are like a certificate of authenticity for each ingredient. The software vendor provides this certificate to confirm that the ingredients used in the recipe are legitimate, unmodified, and comply with their associated licenses.

Generating an SLSA provenance attestation

The provenance attestation created by BuildKit describes how the build was created. Similarly to the SBOM, the SLSA provenance is attached to the image index, wrapped inside an in-toto attestation whose predicate contains the actual SLSA provenance:

To generate an SLSA provenance attestation pass the --provenance=true flag when building your image:

docker buildx build --sbom=true --provenance=true -t felipecruz/buildkit-ssc-features .

Visualizing the SLSA provenance

Given the image is multi-platform, to visualize the content you need to target the platform-specific provenance. You can use buildx imagetools inspect or interactively from explore.ggcr.dev:

docker buildx imagetools inspect felipecruz/buildkit-ssc-features --format '{{ json (index .Provenance "linux/amd64") }}'
{
  "SLSA": {
    "buildType": "https://mobyproject.org/buildkit@v1",
    "builder": {
      "id": ""
    },
    "invocation": {
      "configSource": {
        "entryPoint": "Dockerfile"
      },
  ...
 }

By inspecting the content we can see some interesting information that has been generated automatically, such as:

  • Build timestamps: when the build was started and finished.
  • Invocation info: how the build was invoked, which in my case was using the dockerfile.v0 frontend, linux/amd64 as the environment platform and Dockerfile as the entry point.
  • The build materials: such as the Docker images used as part of the build, and the Git URLs of the repositories containing source code for the image, among others.

That information is just the minimum you get by default, however, using the --mode=max generates significantly more information apart from the one I just mentioned above, such as descriptions of all build steps, with their source and layer mappings.

Reproducibility

BuildKit now supports reproducible builds by setting SOURCE_DATE_EPOCH build argument or source-date-epoch exporter attribute. This deterministic date will be used in image metadata instead of the current time (i.e. in the image config and layers).

SOURCE_DATE_EPOCH=0 docker buildx build ...

👨‍🏫The fourth lesson learned

I would have thought that, by running the same build twice without having modified any source files, I would get the same provenance attestation output and the reproducible field would be set to true automatically.

However, the reproducible field in the attestation  file was always set to false. Unfortunately, even with SOURCE_DATE_EPOCH set, BuildKit still can't automatically determine if the build is reproducible or not - e.g. race conditions in the build, using timing information, randomness from /dev/random, etc.

After reading this, it seems the reproducible is an input value that I can provide at build time based on whether I consider that the build is reproducible or not: --provenance=reproducible=true.

Conclusion

Having SBOM and SLSA provenance attestations generated as part of your Docker build process is now very easy and convenient. I'm impressed of seeing how straightforward is to generate those by passing flags to the docker buildx build command.

The important discovery for me was that BuildKit won't output the SBOMs and SLAs as some other tools could do, but they are wrapped inside in-toto attestations and, attached as manifests to the image root index.

I'm excited to see how BuildKit will continue evolving in securing the Software Supply Chain and supporting us (developers) in our efforts to provide more security-related information when distributing container images.

Learn more