Reproducible Builds at Thistle
In this blog post, we talk about Thistle's approach to software supply-chain security - reproducible builds, for client-side executables as well as Docker container images of backend services.
Reproducible Builds As A Thistle Engineering Practice
At Thistle Technologies we maintain and rely on reproducible builds for our compiled binaries (Rust applications) and service containers (Go applications) as part of our engineering practice. Reproducible builds ensure that our deployed artifacts be built deterministically from source code. The benefits are three-fold:
Security. It allows deployable artifacts to be built and cross checked independently by multiple entities, be it machines or humans, and therefore provides better software supply chain security. To see the importance of independent verification of software compilation, interested readers may read (or revisit) Ken Thompson's Turing award lecture Reflections on Trusting Trust.
Debuggability. It makes it easier to test and debug programs, because we can locally reproduce, and thus locally debug, bit-for-bit identical binaries that run in production.
Ease of release. At Thistle, our software release processes are largely automated, and done by CI/CD pipelines. Reproducible builds allow us to perform QA testing before a production release is triggered, and be sure that when QA passes the test targets would be identical to the released artifacts. Being able to decouple the QA testing and artifact publication steps simplifies our release process, and also minimizes the exposure window of the production released artifacts.
To obtain reproducible builds, the following conditions are usually necessary, but not sufficient:
Deterministic build system, including deterministic build scripts, preprocessor, compiler, linker, and so on
Carefully written source code. E.g., avoid C macros
__DATE__
,__TIME__
,__TIMESTAMP__
Pinned dependencies (which in turn need to be reproducible if built from source)
Uniquely and well-defined build environments, including operating systems, file systems, build paths, environment variables, etc.
It often requires a trial-and-error method and reverse engineering skills to identify and remove the non-deterministic part of a build process, so that we can obtain and maintain the reproducibility property.
Rust and Go are two major programming languages used by Thistle engineers. In both cases we use Nix for the heavy-lifting to obtain reproducible builds. Our respective implementations are described in the rest of the blog post.
Reproducible Rust Executables at Thistle
Our Rust executables, specifically those for the Thistle Update Client (TUC) and Thistle Release Helper (TRH), are statically linked and cross-compiled on Linux for various architectures (aarch64
, armv6
, armv7
, x86_64
). To make them "universally reproducible" (i.e., reproducible on different build machines), we follow the recipe below.
We use the Nix package manager, with a pinned commit of nixpkgs and rust-overlay, to create a deterministic build system, including a fixed snapshot version of the Rust toolchain
We take care when writing code so that we don't introduce non-determinism, e.g., associated with build time or build path, in the compiled code
We need to identify and resolve any non-determinism introduced in dependency packages. For example, we discovered a weakness / bug in a dependency library
zstd-rs
that made the build non-reproducible, and resolved it. We also contributed our determinism improvement to the zstd-rs project.We use Docker to make the reproducible build process a one-click experience.
We open sourced our recipe for building reproducible Rust executables in the GitHub repository rust-cross-build-nix. Have a look on Github to test it out in your projects.
Reproducible Docker Images for Go Applications at Thistle
Thistle's backend services are primarily written in Go, and are deployed to Google Cloud as Docker containers. Go as a programming language makes it fairly easy to get "locally reproducible" application binaries (e.g., they are reproducible on your local machine, with a version of the Go compiler), when we
use go.sum files
use the
-trimpath
flag when executinggo build
don't insert any timestamp information in the executable
(Note that starting from Go 1.21.0, the Go toolchains themselves are easily reproducible.)
We then use the r10edocker tool to turn locally reproducible Go applications into universally reproducible Docker container images in the form of ready-to-deploy gzipped tarballs. The r10edocker
tool uses Nix under the hood to build Docker images. Albeit universally reproducible, the Docker images created is minimum, in that they contain only the application(s), but do not include an OS shell, a package manager, etc. Minimum containers minimize attack surface.
Conclusion
Thistle's engineering practice around reproducible builds enhances our confidence in software supply chain security, makes debugging and troubleshooting easier for engineers, and simplifies our production release process. On the technology side, we leverage Nix to make our builds deterministic/reproducible, and use Docker to make building artifacts a one-click process, which makes independent checks easy to perform.