March 8, 2018

Towards a ClearlyDefined Software Heritage

Here at Software Heritage, we are taking over the mission of collecting, preserving, and sharing the source code of all the software available.

This is a complex task, that involves a broad spectrum of activities. We are working on automating the harvesting of existing publicly available sources, one platform at a time: we do this patiently ourselves, as in the recently announced now ongoing automated archival of Debian packaged source code, and we have paved the wahy for everybody to contribute, providing step by step instructions on how to build a new lister for other platforms out there.

We are actively working on the technical and legal scaffolding needed to set up a network of independent Software Heritage mirrors, that will ensure our archive is preserved over a very long time, as stated in our principles, clearly listed in our white paper.

We are putting considerable energy in developing a modern front-end that will soon allow you to browse the tens of millions of projects contained in our archive.

And we are also establishing important collaborations with kindred spirits worldwide, that share the same passion and dedication to curate and preserve the source code of software, which is a precious asset for all of mankind.

Today, we are delighted to announce that we are partners of the ClearlyDefined initiative, a much welcome collaborative effort sparked by industry players and Free and Open Source software international organisations to curate essential informations for open source software projects, contributing to amplify their success through wider adoption and confidence.

The scope of ClearlyDefined is broad, covering licensing, security, accessibility and other essential information, and we particularly support their focus on factual licensing data such as licenses, copyright holders, and source code location, that perfecly aligns with the basic principles of Software Heritage: knowing for each information we store where, when and how we got it.

Software Heritage will provide a persistent, long term home for curated data coming from the ClearlyDefined initiative, ensuring the results of this important effort are preserved, and enriching the growing metadata we collect about the source code we collect from all around the world.

Yes, we are moving forward towards a ClearlyDefined Software Heritage archive!

— Roberto Di Cosmo