Build Reproducibility Workshop Report

Tue Dec 8 14:48:35 PST 2015

Hello fellow MacPorts people,

Brace yourselves, this mail is going to be long.

As some of you may know [1], I recently represented MacPorts at a
workshop on build reproducibility in Athens organized by the Debian
folks. A wide range of projects and package management systems was
represented; the full list is available on the website [2]. The *BSDs
and Homebrew (Mike McQuaid) are probably most relevant to our interests.

Use Cases
=========
If you are not sure why we should care about reproducible builds, let me
assure you there are plenty of reasons. A more detailed rationale was
prepared at the workshop and should be available at [3] soon, but I'll
take the time to mention a few points. Note that "reproducible builds"
in this context always means bit-by-bit reproducibility, unless
explicitly stated otherwise.

 - Source <-> Binary Correspondence: Reproducible builds allow
   developers to verify that what our buildbot is serving us is actually
   what it claims to be.
 - Attack Surface Reduction: Having reproducible builds reduces the
   motivation to attack our buildbot setup, because modifications can be
   detected.
 - Caching: You can avoid rebuilds of packages that build reproducibly
   if the inputs haven't changed. This doesn't seem to be *that*
   important to us at the moment, but is a big selling point, especially
   for commercial software development.
 - Delta Reduction: With reproducible builds, small changes in source
   will be more likely to cause small changes in the resulting binary.
   This could be used to allow binary-delta updating, reducing download
   time, bandwidth requirements, and update time.
 - Support Burden Reduction: Build reproducibility can provide
   confidence that a user's build is exactly what a packager intended
   and rule out a whole class of bugs.

How to Build Reproducibly
=========================
The process to get reproducible builds is pretty well-understood. The
reproducible-builds.org documentation outlines the most common problems
and issues that prevent reproducible builds [3]. For the most part, all
distributions face the same issues, which allows us to build on the
effort of projects with larger man power, like Debian. There are a
couple of points that might not be obvious or easily overlooked that I'd
like to point out:

 - Filesystem ordering and locale-dependent sorting: Relying on the
   order of files that readdir(3) returns makes builds unreproducible.
   Sorting those files will only help if the sort result doesn't differ
   by locale.
 - Timestamps are everywhere and are responsible for a large part of
   unreproducible builds. Using __DATE__, __TIME__, __TIMESTAMP__, or
   similar macros should be avoided. Version numbers or version control
   system information are much better replacements: If your build is
   reproducible, it does not matter *when* it happened. However, lots of
   tools include timestamps by default, such as gzip(1) when compressing
   our manpages (r143068) or tar(1) when creating our binary archives.
   Strategies to solve these problems exist, e.g. by providing a ceiling
   value for all time stamps while creating a tarball or using the
   environment variable SOURCE_DATE_EPOCH [4] for date-dependent macros.
 - Well-defined build environments: Pretty much the rest of the world
   has good OS-level support for a chroot(2)-like mechanism that can be
   used to provide a build environment that only contains inputs from a
   controlled list of dependencies. FreeBSD has jails, Linux has
   namespaces, but the only thing OS X supports in this direction are
   chroots, and those have a reputation of breaking some of Apple's
   tools like xcodebuild (a reputation I may set out verifying or
   falsifying). Trace mode is a step into the right direction, but
   doesn't catch everything and is very slow compared to other methods.
   To additionally make matters more complicated, we rely on Apple's
   toolchain, which can be updated and/or changed independent of
   MacPorts.

Testing Build Reproducibility
=============================
In order to find out whether a build can be reproduced, it should be
done multiple times, with possibly varying input settings. The more
input and environment settings can be modified without the build result
changing, the higher the reproducibility. Debian has a couple of
machines available and runs a Jenkins setup that will build each package
twice but vary a couple of settings for the second build, such as:
hostname, domainname, environment variables (TZ, LANG, LC_ALL, PATH),
UID/GID, Kernel version, umask, CPU type, current time (by a large
amount to trigger changes in year, month and day regardless of
timezone), and filesystem sort order (by using a FUSE filesystem that
will make readdir(3) return different results). While Debian's setup is
available for use by other projects, it is of little use to us because
OS X cannot be virtualized on non-Apple hardware without violating the
EULA. One of the biggest hurdles towards systematic testing for build
reproducibility in MacPorts (and Homebrew as well, btw) is thus the
availability of Apple hardware.

To track down the differences that cause builds to be non-reproducible,
a couple of people from the Debian reproducible builds effort have
written diffoscope [5], a python diff tool that will interpret file
formats and try hard to give you a human-readable difference between two
files. Support for Mach-O binaries is available as a patch at [6] (and I
hope to push it upstream soon). This tool could also be helpful to look
at differences in stealth updates.

State of Reproducible Builds in MacPorts
========================================
Despite the several obstacles mentioned above, build reproducibility in
MacPorts is actually not a lost cause. This is partly because we have
historically always tried to keep a clean and similar build environment
across machines, e.g. by using privilege separation, removing all but a
few white-listed environment variables and trace mode. Timestamps are
our biggest issue on the road towards reproducible tarballs at the
moment. In a sloppy test done by Marius Schamschula and me, we managed
to reproduce our builds of bash down to timestamp issues in gzip headers
and tarball metadata. Unfortunately, generating statistics on
reproducibility requires buildserver support.

To fix the timestamp issues, I am looking for a suitable value to use as
SOURCE_DATE_EPOCH and then add a find statement before creating the
archive that will put an upper mtime limit on all files to be packaged.
I am not yet sure what a good (reproducible!) timestamp might be:
 - The Portfile mtime would be perfect, but is not preserved by
   Subversion, so we cannot rely on it. It is preserved by our rsync
   sync, but the mtime in that is probably meaningless since it's the
   one generated on the rsync server during svn update.
 - The newest timestamp inside a source code tree is a good choice (and
   https://github.com/0-wiz-0/findnewest could easily give us that
   timestamp), but sources fetched from version control systems do not
   always set it to the time of the commit (AFAIK Git doesn't, for
   example).
 - A fixed value of 0 or 1 is not a very good choice.
We could put an additional piece of metadata into Portfiles to be used
as timestamp (e.g. just like we have checksums). It is my understanding
that FreeBSD will chose to go this route.

Miscellaneous Topics
====================
I've learned that our builds of GHC and all Haskell modules are likely
ABI-incompatible when downloaded from the buildbot vs. built locally.
We should disable parallel building for Haskell to fix this until
upstream provides a better solution. Luckily, this hasn't largely
affected us yet, because binary availability in the Haskell land is
high.

Homebrew achieves good binary package coverage for non-default prefixes
by scanning the build results for $prefix. In library load commands, the
path is changed using install_name_tool(1) on installation locally, in
text files, the path is simply changed. If $prefix is found in a binary
file, the archive is marked as non-prefix-invariant and ignored by
non-default prefix installations.

Homebrew has methods to provide compiler wrappers that ensure that build
systems are UsingTheRightCompiler, and additionally ensure that the
compiler flags are set as expected (e.g. -arch flags, -stdlib flag for
C++).

Google's Blaze (Open Source: Bazel) build system supports license
annotation on build results and license compatibility analysis. Their
approach to the problem might be interesting input for the set of
scripts we use to determine whether a binary archive is distributable.

Acknowledgments
===============
I'd like to thank portmgr@ for giving me the chance to represent the
MacPorts Project at this event.

Travel and Accomodation has been sponsored by the Linux Foundation.
Conference Location and Moderation have been sponsored by the Open
  Technology Fund.
Dinner has been provided by Google ;-)

[1] https://lists.macosforge.org/pipermail/macports-dev/2015-September/031440.html
[2] https://reproducible-builds.org/events/athens2015/
[3] https://reproducible-builds.org/docs/
[4] https://reproducible-builds.org/specs/source-date-epoch/
[5] https://diffoscope.org/
[6] https://lists.reproducible-builds.org/pipermail/diffoscope/2015-December/000000.html

-- 
Clemens Lang
MacPorts Developer