enero 8, 2018

Getting up to speed, clear sky ahead

One year has passed after we posted our first activity report, and it is now a good time to look at what was accomplished in 2017, and give some perspective on the future.

Our mission and our principles

Here at Software Heritage, we are taking over the mission of collecting, preserving, and sharing the source code of all the software available.

We do this for multiple reasons. To preserve the scientific and technological knowledge embedded in software source code, that is a precious part of our heritage. To allow better software development and reuse for society and industry, by building the largest and open software knowledge database, enabling the development of a broad range of value added applications. To foster better science, by assembling the largest curated archive for software research, and building the infrastructure for preserving and sharing research software.

We do this now, because we are at a turning point: on the one hand, most founding fathers of computer technology are still around, and willing to contribute their knowledge, but only for a limited time. On the other hand, we seem to be at increasing risk of massive lossage of source code developed by the Free and Open Source community, in particulare due to code hosting sites that shut down when their popularity decreases.

This challenging and humbling undertaking must be carried on in a long term perspective, hence we have published a white paper that states a first set of principles on which we base our work, including transparency, openness, and collaboration.

And we are well aware of the fact that the success of our mission requires widespread recognition, proper resources and good scientific and technical foundations.

Raising awareness at the political level

2017 has been an exceptional year for raising awareness of the importance of software source code at the highest levels.

In early January, an intense week in Chile led to headline news and official support after meeting with the president of the republic.

On April 3rd, months of patient
preparatory work culminated in the signature of a landmark agreement with Unesco on software source code preservation and access, in the presence of the president of the french republic, ambassadors from many countries, and four hundred personalities.
On September 28th, we took part in the Unesco working groups for the International Day on Universal Access to Information, and the outcome document, the Mauritius call, advocates policies, standards and legislations to enable access to and preservation of information, including, explicitly, our software heritage.

Growing support

When we unveiled the Software Heritage initiative exactly one year and a half ago, on June 30th, 2016, we were happy to count on Microsoft and DANS as early sponsors, and over twenty endorsers.

This year, we have welcomed six more sponsors, Société Générale, Intel, Huawei, Nokia Bell Labs, the University of Bologna and, freshly arrived, GitHub.

In November, representatives from all the sponsors were invited to Paris for a first face to face meeting, which was a great occasion to exchange ideas on the progress that has been made and discuss future developments.

Connecting with diverse communities

We have reached out to many communities, explaining what we do, and inviting them to join forces.

The keynote at FOSDEM in February was a first great opportunity to connect with fellow our Free and Open Source software developers, followed by the OSCON presentation in May, and the EclipseCon keynote in October, among others.

We kept our own Computer Science community up to date on Software Heritage in many occasions, and in particular the Inria’s 50th anniversary celebration, the European Computer Science Summit and the ACM working group on reproducibility. Early contacts have been made with several research teams around the world that are interested in the unique potential offered by the Software Heritage archive.

To address the scientific community at large, we joined forces with DANS and the SSI to spark interest in software source code in the Research Data Alliance, that led to the creation of a group specifically dedicated to it. The relevance of Software Heritage for streamlining scientific software citation has been noticed, and we launched a fruitful collaboration on researcher initiated software deposit with the french national open access repository, HAL.

Last, but not least, we made contact with many other preservation initiatives: Computer History MuseumBNF, Software Preservation NetworkPersist, Living Computers: Museum + Labs, and CINES.

Many of the talks given in these occasions are available online, some with accompanying videos.


A large amount of work has been invested on the Software Heritage own infrastructure: most of it was under the hood, but some results are visible already.

A top priority is still to expand our collection of source code. We are quite proud to have crossed the 4 billion unique source code files mark, harvesting over 70 millions origins, but we know that there is a long queue waiting out there. So we spent quite a bit of effort to   make it easy for external contributors to add support for more code hosting platforms, with special thanks to a great collaborator: you can now write your own lister in a few lines of code!

And yes, we know that a crowd is queuing up in front of the doors of the great library of source code we are building, eager to have a look at its contents. A first step has been made by opening up a public API to access a part of the archive, and thanks to the new members of the team, that has grown significantly over the past months, we will be making progress faster.

As we prepare to fully open the doors to the archive, the question of the terms of use for the data we collect required special care and attention: we are quite happy to have made a significant step forward by publishing the terms of use for our public API.

Looking ahead

Being committed to a long term mission, we will patiently tackle one after the other the many challenges that lie in front of us, building paths for collaboration with all kindred spirits around the world.

This coming year we will prioritize developments that will enable you to browse and download the contents of the Software Heritage archive. A software deposit mechanism will also be made available for connecting with specific platforms, in particular for research software.

We will also focus on the technology, process and legal framework for establishing the first mirrors of the Software Heritage archive.

On the organisational side, a very important step will be the establishment of the Software Heritage Foundation, after many long months of preparatory work, providing the official non profit structure that will oversee the development of Software Heritage on the long term, and provide the proper vehicle for accepting contributions and donations of any size from all those that share our vision.

There is clear sky in front of us, it’s time to take off for another year.

— Roberto Di Cosmo