Unreproducible builds: Time to fix, causes, and correlation with external ecosystem factors

Rahul Bajaj*, Eduardo Fernandes, Bram Adams, Ahmed E. Hassan

*Corresponding author for this work

Research output: Contribution to journalJournal articleResearchpeer-review

Abstract

Context
A reproducible build occurs if, given the same source code, build instructions, and build environment (i.e., installed build dependencies), compiling a software project repeatedly generates the same build artifacts. Reproducible builds are essential to identify tampering attempts responsible for supply chain attacks, with most of the research on reproducible builds considering build reproducibility as a project-specific issue. In contrast, modern software projects are part of a larger ecosystem and depend on dozens of other projects, which begs the question of to what extent build reproducibility of a project is the responsibility of that project or perhaps something forced on it.

Objective
This empirical study aims at analyzing reproducible and unreproducible builds in Linux Distributions to systematically investigate the process of making builds reproducible in open-source distributions. Our study targets build performed on 11,528 and 597,066 Arch Linux and Debian packages, respectively.

Method
We compute the likelihood of unreproducible packages becoming reproducible (and vice versa) and identify the root causes behind unreproducible builds. Finally, we compute the correlation between the reproducibility status of packages and three ecosystem factors (i.e., factors outside the control of a given package).

Results
Arch Linux packages become reproducible a median of 30 days quicker when compared to Debian packages, while Debian packages remain reproducible for a median of 68 days longer once fixed. We identified a taxonomy of 16 root causes of unreproducible builds and found that the build reproducibility status of a package across different hardware architectures is statistically significantly different (strong effect size). At the same time, the status also differs between versions of a package for different distributions and depends on the build reproducibility of a package's build dependencies, albeit with weaker effect sizes.

Conclusions
The ecosystem a project belongs to, plays an important role w.r.t. the project's build reproducibility. Since these are outside a developer's control, future work on (fixing) unreproducible builds should consider these ecosystem influences.
Original languageEnglish
Article number11
JournalEmpirical Software Engineering
Volume29
Number of pages48
ISSN1382-3256
DOIs
Publication statusPublished - Feb 2024

Keywords

  • Build environment
  • Release management
  • Reproducible build
  • Software package
  • Software security
  • Supply chain attack

Fingerprint

Dive into the research topics of 'Unreproducible builds: Time to fix, causes, and correlation with external ecosystem factors'. Together they form a unique fingerprint.

Cite this