enero 17, 2017

We have come a long way, and the road ahead

We unveiled the Software Heritage initiative exactly six month ago, on June 30th, 2016. Now it seems a good time to look back at the origin of the project, where we started, what we have accomplished up to now, and get a glimpse of the future.

A (not so long) time ago…

The first informal discussions on what has now become Software Heritage started back in the spring of 2014, (as often happens, around a coffee machine and a tea pot) at IRILL in Paris. In the months that followed, some serious preliminary work had to be done: exploring the state of the art, charting related initiatives, elaborating the vision we now proudly share, estimating the effort required, and finding the right host and the initial support for starting the project.

inria logo english resized

It was an extremely intense period, but if one has to pick a symbolic starting date, that would definitely be October 21st 2014: on that day, Inria‘s director, Antoine Petit, after an in-depth analysis of the initial case statement, encouraged me to go forward and propose the project to the Inria decision bodies.


And indeed, after going through some serious scrutiny, Inria’s support for the project became quickly a reality: in march 2015 we had a go, the advisory committee was put in place, and by September 2015 we had our logo, the current team was already complete and we were working at full speed.

… we started collecting the source code of the World …


During the summer of 2015, after a much needed inaugural dinner, our initial infrastructure was set up, and the operations for collecting the source code repositories were started.


The graphs on our website trace the steady growth of the archive since then, and we have started providing some insight on what is going on under the hood.

We are especially happy to have arrived just in time to collect the contents of Gitorious and Google Code before it was too late, thanks to some wonderful people that were eager to help.

What makes our mission special, is that we do not just check out a copy of a given software project when we crawl it: we trace all of its development history, as contained in its version control system, and we do this again every time we visit it, building a sort of Wayback machine of software development.

…looking for partners and spreading the word

For Software Heritage to keep its promise of long term source code preservation, it is essential to bring together a broad set of partners, from cultural heritage to education, from research to industry, build support for the mission, and pave the way for collaboration. Naturally, we opened up our source code that is all Free/Open Source Software.

In parallel with the technical development, an intense effort was dedicated to the presentation of the vision of the project to a great many people, institutions, industries and organisations: the talks given in the course of this journey are available on our wiki.

The breadth of the scope of Software Heritage means that there are different aspects that will appeal differently to different stakeholders. Hence we gave different presentations focusing on the digital preservation issues, the science crisis and software reproducibility, the scientific challenges and the technical architecture of this unprecedented archive, as well as industry applications, like compliance.

Where we stand today…

After unveiling Software Heritage to the world, six month ago, we are happy to see quite a lot of media coverage, that is traced in our public git annex repository, and a broad support of our mission and our way of implementing it with openness and transparency.


We were delighted to be invited to deliver a keynote at OSCON London and receive our first award on occasion of the Paris Open Source Summit this November.

Several major IT players are already sponsoring the project, and we hope to see more coming, especially from the Free/Open Source world, that is the first concerned by our initiative.

More infrastructure is now available for the collection and archival effort, and we are working hard to put in place a first mirror on the cloud.

…and a glimpse of the future

Looking back, we made a lot of progress, but looking forward, there is a lot more left to be done… here is a glimpse of what is on our roadmap for the future.

Of course, our top priority now is to collect more and more existing source code: mainstream development platform have started closing down last year, and we need to focus on endangered content first. This means discovering more software sources, and ingest their contents. But we hope to be able next year to open up the doors of this great library of source code for all to read, and for scientists to analyze: that’s why we are hiring.

We would also like to start working on connecting Software Heritage with existing metadata information and in particular with Open Access and Open Data repositories. And we might explore ways of enriching the storage layer with a replicated and distributed infrastructure that will ease data access.

If you share our vision and like what we do, please explore our website, follow the pointers, and you will find many ways of lending a hand: it can be as simple as spreading the word.

— Roberto Di Cosmo