Software Heritage is a long term effort to build a common infrastructure to collect, preserve and share the source code of all software publicly available.
We are delighted to share some great news about the intrinsic identifiers, called SWHIDs, that the Software Heritage archive provides for the tens of billions of software artifacts that it preserves:
- the Software Package Data Exchange (SPDX) specification includes SWHIDs in its recently published version 2.2
- the updated full specification of SWHIDs is available
- the swh prefix used in SWHIDs is now registered with IANA
As detailed in previous articles, SWHIDs must not only be unique and persistent, but also support integrity checking in an intrinsic way. It is essential for making sure that the web of knowledge build around the Software Heritage archive will pass the test of time: for reproducibility of research results and Open Science, as well as for building trusted software bills of materials for industry.
This is why Software Heritage Identifiers (SWHIDs) are based on cryptographically strong hashes and generalized Merkle trees, as in popular distributed version control systems like git or mercurial.
What makes SWHIDs special is that they do not depend at all on the version control system, if any, used to develop a software project: any software artifact ingested in the Software Heritage archive get these identifiers.
It’s really easy to obtain them using the Permalinks sidebar present on all pages of the Software Heritage archive, and you can compute them locally on your machine, using the swh-identify standalone tool.
Yes, SWHIDs are here, and will be coming soon handy in one of your use cases…