Unreproducible builds: Time to fix, causes, and correlation with external ecosystem factors

  • Rahul Bajaj*
  • , Eduardo Fernandes
  • , Bram Adams
  • , Ahmed E. Hassan
  • *Kontaktforfatter

Publikation: Bidrag til tidsskriftTidsskriftartikelForskningpeer review

Abstract

Context
A reproducible build occurs if, given the same source code, build instructions, and build environment (i.e., installed build dependencies), compiling a software project repeatedly generates the same build artifacts. Reproducible builds are essential to identify tampering attempts responsible for supply chain attacks, with most of the research on reproducible builds considering build reproducibility as a project-specific issue. In contrast, modern software projects are part of a larger ecosystem and depend on dozens of other projects, which begs the question of to what extent build reproducibility of a project is the responsibility of that project or perhaps something forced on it.

Objective
This empirical study aims at analyzing reproducible and unreproducible builds in Linux Distributions to systematically investigate the process of making builds reproducible in open-source distributions. Our study targets build performed on 11,528 and 597,066 Arch Linux and Debian packages, respectively.

Method
We compute the likelihood of unreproducible packages becoming reproducible (and vice versa) and identify the root causes behind unreproducible builds. Finally, we compute the correlation between the reproducibility status of packages and three ecosystem factors (i.e., factors outside the control of a given package).

Results
Arch Linux packages become reproducible a median of 30 days quicker when compared to Debian packages, while Debian packages remain reproducible for a median of 68 days longer once fixed. We identified a taxonomy of 16 root causes of unreproducible builds and found that the build reproducibility status of a package across different hardware architectures is statistically significantly different (strong effect size). At the same time, the status also differs between versions of a package for different distributions and depends on the build reproducibility of a package's build dependencies, albeit with weaker effect sizes.

Conclusions
The ecosystem a project belongs to, plays an important role w.r.t. the project's build reproducibility. Since these are outside a developer's control, future work on (fixing) unreproducible builds should consider these ecosystem influences.
OriginalsprogEngelsk
Artikelnummer11
TidsskriftEmpirical Software Engineering
Vol/bind29
Udgave nummer1
Antal sider48
ISSN1382-3256
DOI
StatusUdgivet - feb. 2024
Udgivet eksterntJa

Fingeraftryk

Dyk ned i forskningsemnerne om 'Unreproducible builds: Time to fix, causes, and correlation with external ecosystem factors'. Sammen danner de et unikt fingeraftryk.

Citationsformater