Build Reproducibility Workshop Report
Ryan Schmidt
ryandesign at macports.org
Wed Dec 9 17:21:09 PST 2015
Thank you for your attendance and for your report!
On Dec 8, 2015, at 2:48 PM, Clemens Lang wrote:
>
> Testing Build Reproducibility
> =============================
> In order to find out whether a build can be reproduced, it should be
> done multiple times, with possibly varying input settings. The more
> input and environment settings can be modified without the build result
> changing, the higher the reproducibility. Debian has a couple of
> machines available and runs a Jenkins setup that will build each package
> twice but vary a couple of settings for the second build, such as:
> hostname, domainname, environment variables (TZ, LANG, LC_ALL, PATH),
> UID/GID, Kernel version, umask, CPU type, current time (by a large
> amount to trigger changes in year, month and day regardless of
> timezone), and filesystem sort order (by using a FUSE filesystem that
> will make readdir(3) return different results). While Debian's setup is
> available for use by other projects, it is of little use to us because
> OS X cannot be virtualized on non-Apple hardware without violating the
> EULA. One of the biggest hurdles towards systematic testing for build
> reproducibility in MacPorts (and Homebrew as well, btw) is thus the
> availability of Apple hardware.
Apple hardware is available. Virtualizing OS X on Apple hardware is no problem; we do it today with our existing buildbot builders.
> State of Reproducible Builds in MacPorts
> ========================================
> Despite the several obstacles mentioned above, build reproducibility in
> MacPorts is actually not a lost cause. This is partly because we have
> historically always tried to keep a clean and similar build environment
> across machines, e.g. by using privilege separation, removing all but a
> few white-listed environment variables and trace mode. Timestamps are
> our biggest issue on the road towards reproducible tarballs at the
> moment. In a sloppy test done by Marius Schamschula and me, we managed
> to reproduce our builds of bash down to timestamp issues in gzip headers
> and tarball metadata. Unfortunately, generating statistics on
> reproducibility requires buildserver support.
I'm open to changes in the buildbot builder setup or replacing it, if there's something better.
> To fix the timestamp issues, I am looking for a suitable value to use as
> SOURCE_DATE_EPOCH and then add a find statement before creating the
> archive that will put an upper mtime limit on all files to be packaged.
> I am not yet sure what a good (reproducible!) timestamp might be:
> - The Portfile mtime would be perfect, but is not preserved by
> Subversion, so we cannot rely on it. It is preserved by our rsync
> sync, but the mtime in that is probably meaningless since it's the
> one generated on the rsync server during svn update.
And it differs between different rsync mirrors, because they mirror at different times. We should examine our mirroring strategy and fix it so the mirrors are true copies, including metadata like timestamps.
> Miscellaneous Topics
> ====================
> Homebrew achieves good binary package coverage for non-default prefixes
> by scanning the build results for $prefix. In library load commands, the
> path is changed using install_name_tool(1) on installation locally, in
> text files, the path is simply changed. If $prefix is found in a binary
> file, the archive is marked as non-prefix-invariant and ignored by
> non-default prefix installations.
I've thought about doing this at least for the MacPorts installer itself, to make it easier for users to install MacPorts to other prefixes. However, I figured demand for this was low, since in its current state this would make that prefix ineligible for binaries. Extending this strategy to all ports is probably more difficult as I suspect a large number of edge cases.
I would suggest that the prefix for a buildbot builder in this hypothetical scenario should be a long random string such that it is highly unlikely to occur within any project's source, whereas the string "/opt/local" occurs in many projects' sources and we might inadvertently replace something we shouldn't replace. On the other hand, continuing to use /opt/local as the build prefix might fix undiscovered prefix portability problems in some ports. However, a long prefix would have the advantage that a port that doesn't obey our LDFLAGS and thus doesn't include -headerpad_max_install_names would nevertheless have enough space in the library for the user's presumably shorter custom prefix.
On Dec 9, 2015, at 12:31 PM, Clemens Lang wrote:
>
>> Subversion can be configured to apply the commit timestamps to files
>> (use-commit-times=yes). We could use that on the server creating the
>> port tarball for rsync. Although I would prefer a more generic
>> solution.
>
> I've checked the portindex generation code, and this would actually work
> for our solution. Enabling this setting in subversion working copies
> locally would however lead to ports not being reindexed when they should
> in some situations, so not really a good solution.
I think the problem is that the rsync server is where the portindex is generated. Since use-commit-times=yes is such an obvious Subversion feature, there must be a good reason why it hasn't been enabled on the server yet.
>> We already have been using -headerpad_max_install_names for a few
>> years, but never dared to go further and actually fix this in a
>> compiled port archive. I am surprised this works for them.
>
> So was I, which is why I included the information. Mike told me they'd
> actually get decent coverage using this approach.
I've used dylibbundler a few times successfully. It uses a similar approach. I'm not surprised it's possible to change libraries this way. I expect the problems would be the edge cases of all the other places where the prefix is recorded, outside of libraries. But it sounds like they've worked on that too.
More information about the macports-dev
mailing list