We are building an essential infrastructure, that is meant to ensure three main properties for the source code we collect:

  • availability : the code will be stored, preserved and made accessible on the long term
  • traceability : each software component will get a unique identifier that can be relied upon in the long term
  • uniformity : despite the great variety of origins, all of the source code collected in our archive will be accessed through the same uniform API

We base our infrastructure on three main pillars that provide a solid foundation.


Long term preservation efforts cannot be based on black boxes that hide the process behind closed source. We are long-time Free/Open Source Software developers and advocates, our code and specifications will be open.

Open architecture

We are designing a complex software architecture. Its design and specifications will be made public.

Free/Open Source Software

All the code developed for Software Heritage will be released under a Free and Open Source Software (FOSS) license.

Collaborative development

We will adopt an open development process, and strive to create a development community around all components of the Software Heritage infrastructure.

Intrinsic unique identifiers

Each software component is assigned a unique identifier that is intrinsically bound to it. It does not rely on third parties, so it is truly persistent, and everybody can build on it.

Unique identifiers

Every software artifact receives an unique identifier. This unique reference can be used in textbooks, documentation, build instructions and many other places to build a consistent web of knowledge.

Intrinsic identifiers

We use intrinsic identifiers in Software Heritage, that can be directly computed from a software artifact.  There is no need to rely on a third party to know whether a given identifier corresponds to a given artifact.

Distributed and multistakeholder infrastructure

“Let us save what remains: not by vaults and locks which fence them from the public eye and use in consigning them to the waste of time, but by such a multiplication of copies, as shall place them beyond the reach of accident.” — Thomas Jefferson

No single point of failure

We are planning a distributed infrastructure, that will enable to duplicate all the contents among a large set of peer nodes.

This is essential to prevent information loss, and will greatly simplify sharing,

A multistakeholder network of peers

We will actively seek to grow a multistakeholder network of peers.

New partners will be able to easily join our efforts along the way, thanks to our open source code, and our open specifications.