January 5, 2022

Software Heritage in 2021: five years already!

Despite the health crisis we have been experiencing since March 2020, Software Heritage pursues its mission of collecting, preserving and sharing software source code, which started many years ago and was unveiled to the public on June 30th 2016.

Expanding the archive: a collective effort

The archive is growing steadily, counting now over 11 billion unique source files from more than 170 million projects, from a growing number of origins: we are grateful to the volunteers that submitted over 100.000 Save Code Now requests, and to the many expert contributors that helped expand the coverage of the archive (more on this below).

Overall, there has been significant technical progress, described in the Software Heritage 2021 technical roadmap that provides an overview of what we are working on in a number of areas: Collect, Preserve, Share, Organize, Measure, Documentation, Community and Tooling. Many of these tasks involve important improvements to the Software Heritage infrastructure and software stack that may go unnoticed, despite being very important: for example, we rehauled the archive counters, added a breakdown of the archived origins, and a Save Code Now request is now handled in a matter of minutes, which is a real game-changer for the users.

Documentation has been improved significantly, both for developers and for users, and we invite you to visit the brand new Features Page that provides precious insight into the many new functionalities that are now available, as well as a peek under the hood.

Building the universal source code archive is not an easy task: we need to cope with a variety of technical challenges, and in order to succeed in the long term mission of Software Heritage, a collective effort is needed.

This year, we pursued our partnership with the Alfred P. Sloan Foundation and the NLnet Foundation to provide grants for experts that are willing to get involved and build the many connectors needed to expand the coverage of the archive.

A grant was awarded to Octobus to work on archiving SourceForge and adding Bazaar to the list of version control systems supported by the Software Heritage ingestion pipeline; a grant was also awarded to OCamlPro to help increase the coverage of the Software Heritage by integrating it with the OCaml ecosystem; another one to Easter-Eggs to help us to build the next-generation object storage for Software Heritage; and a grant was awarded to Castalia Solutions to develop the Maven Repositories connector to archive the Maven ecosystem.

We are still calling on all experts to step up and express their interest in participating! Please fill in this simple form if you are interested.

Growing an international community 

The Software Heritage infrastructure now offers a wealth of stable features that can be used in a variety of applications, ranging from cultural heritage to science, industry and public administration, and the time has come to foster adoption broadly. To this end, we launched the Ambassador program, and in 2021 we were delighted to welcome 17 volunteers ambassadors willing to contribute to community engagement, and accelerate the adoption of Software Heritage in the many fields where it brings groundbreaking benefits. You can contact them to learn more about Software Heritage, and you can become an ambassador too.17 Software Heritage Ambassadors

We welcomed Google Summer of Code students again, giving more student developers access to open source software development.

Recognizing Software as a key pillar of Open Science

This year 2021 has been a turning point for software in Open Science, with several highly relevant events. On July 6th, we were excited to see software fully recognized as a key pillar of Open Science in the second national plan for Open Science, unveiled by the French Ministry of Research. In this landmark official government document, a groundbreaking strategy for software in research has been laid out, and Software Heritage plays an important role in it.

On September 29th, the European Open Science Cloud (EOSC) established several task forces, dedicated to improving the infrastructures supporting Open Science in Europe, and we are happy to co-chair the one focused on infrastructures for quality research software.

In November, the UNESCO member states approved the recommendation on Open Science, which now explicitly mentions Open Source software as a key component of Open Science. It also states that open science infrastructures should be “based, as far as possible, on open source software stacks” and “organized and financed upon an essentially not-for-profit and long-term vision”, which is exactly the approach we have taken in Software Heritage since the very beginning of our journey.

Software Heritage for cybersecurity

This year, awareness has been rising significantly about the increasing impact of cybersecurity threats on society as a whole, and the executive order issued in May from the President of the United States has a full section dedicated to “Enhancing Software Supply Chain Security”, that includes a call for ensuring and attesting, to the extent practicable, to the integrity and provenance of open source software used within any portion of a product”. This gave us the opportunity to show how the Software Heritage archive can contribute by improving the software supply chain, by ensuring availability, guaranteeing integrity, and enabling traceability of all publicly available software source code.

Celebrating five years at UNESCO: our first International Conference

On November 30th, a special event took place at UNESCO’s headquarters to celebrate the five years of Software Heritage. It was the opportunity to take stock of the achievements and status of Software Heritage, and to highlight the relevance of building a universal software source code archive in the context of today’s dynamic digital innovation landscape.

Unesco – Paris | © Inria / Photo B. Fourrier

On this occasion, we brought together the growing international community of Software Heritage, welcomed CEA as our first diamond sponsor, and unveiled Software Stories, a novel approach to present the history of landmark software projects, developed in a joint collaboration with the team and the University of Pisa.

A detailed account of the event can be found on the UNESCO dedicated web page, as well as on our blog. 5 minutes celebratory video that recaps the key milestones of these first five years has been unveiled.

Thanking our sponsors


We are very grateful to all of our sponsors and partners for maintaining, and even increasing, their support of our mission, despite the difficult year we all went through again in 2021. This is essential to ensure that Software Heritage can continue to develop its core infrastructure, and roll out the services that will make it useful to all the stakeholders!

Looking ahead

In the year to come, our top priority will again be to ensure that the key functionalities that Software Heritage offers are rock solid, completing the work already started to improve key components of the infrastructure under the hood. We look forward to rolling out the first operational mirrors, improving the resilience of the preservation effort started over five years ago, and to continue expanding the archive coverage, as well as integrating extrinsic metadata sources that will help better describe its contents.

And most importantly, we will continue to expand the international community around Software Heritage: collecting, preserving and sharing all the software source code is a humbling undertaking that requires institutional support, with sponsors and partners (take a look at the different sponsorship program possibilities), as well as individual engagement, ranging from collaborators to contributors, from ambassadors to the donors that are answering the call of the ongoing end of year fundraising campaign (yes, you should join too!).

We look forward to working with all interested parties: let’s work together to preserve our past, improve our present, and prepare a better future.

January 5, 2022