avril 18, 2019

Software Heritage and GNU Guix join forces to enable long term reproducibility

Our mission is to collect, preserve, and share the source code of all software that is publicly available, including its full development history.

To this end, we already periodically fetch and archive source code from a growing set of origins: release tarballs from the GNU servers, repositories from GitHub, packages from PyPI, and much more.

Software Heritage enables long term access to software source code, enabling users to retrieve it even if it disappers from the platform where they are used to find it, or, worse, when the platform itself goes away.

Today we are delighted to announce the first results of a collaboration with GNU Guix, that is a stepping stone for long term reproducibility of research software.

Meet GNU Guix

Guix is an advanced distribution of the GNU operating system developed by the GNU Project that has made reproducibility its core mission, setting it apart from other tools.

This property is crucial in many cases, and notably for reproducible science, on which the Guix-HPC effort focuses specifically.

To this end, Guix is built around package definitions, that specify how to (re)build the package, and first of all, the URL where the package’s source code can be found.

This source code can be a “tarball” fetched from a web site, or a specific revision checked out directly from a development platform.

Unfortunately, URLs can break, projects can migrate and development platforms can shut-down, so one of GNU Guix’s prime concern was: how to ensure that a system can still be built even when the original source code becomes unavailable?

That’s what Software Heritage is for!

Since Software Heritage archives source code for the long term, Guix can fall back to the Software Heritage archive whenever it fails to download source code from its original location. The way this fallback has been designed, package definitions don’t need to be modified: they still refer to the original source code URL, but the downloading machinery transparently comes to Software Heritage when needed.

Support for Software Heritage was integrated in Guix in November 2018, making it the first free software distribution backed by a stable archive (a detailed presentation of what goes on under the hood can be found on the Guix blog).

More work is needed to ensure that all the source code referenced by Guix packages is safely archived in Software Heritage, but the first big step has now been made, showing the value of the mission that Software Heritage has undertaken.

avril 18, 2019