Close

Features

Software Heritage features

Sharing the source code commons with a set of trustworthy and open services that provide access to the largest source code library in the world.

Browse & Search

The SWH archive is the gateway to all captured source code and its entire development history. With the browsable platform, it is possible to visualize all the visits made to a given location of the code (collected from different forges, package managers and distros) and read the source code content captured.

Software Hash IDentifier (SWHID)

The Software Hash IDentifier (SWHID), a universal identifier for software pioneered by Software Heritage, is officially the ISO/IEC international standard 18670. SWHID provides intrinsic, verifiable identifiers for long-term traceability and reproducibility. A major step for a more robust digital ecosystem, driven by community collaboration.

Citation feature

Software Heritage offers built-in citation support, a significant step in acknowledging software as a legitimate research output.
With this feature, researchers, users, and readers can now easily generate and copy a BibTeX citation directly from Software Heritage into their .bib files.
Citing a specific software version—or even a precise code fragment—is just as simple: select the version or highlight the lines of code, and you’re ready to go.

Download

The Vault provides a way to reconstruct and download self-contained bundles from the archive. These bundles, containing directories and revisions, can then be imported locally, for instance into a Git repository, using either the web platform or the API.

For more information

Go to the download directory API endpoint

Save Code Now

Archiving every repository globally takes time, especially with frequent daily updates. That’s why we offer the «Save Code Now» service, allowing users to proactively request a save to the SWH archive.

Go to the API endpoint

Deposit

The deposit feature is a SWORD 2.0 Server implementation. S.W.O.R.D (Simple Web-Service Offering Repository Deposit) is an interoperability standard for digital file deposit. The deposit allows a client (a repository, e.g. HAL) to submit software source archives and its associated metadata to the SWH archive. Metadata can be also submitted referencing a repository url (origin) or a SWHID.

For more information

Add Forge Now

With the «Add forge now» service, Software Heritage users can submit a forge URL to be included in our regular archiving schedule.
All of these requests undergo a validation workflow that includes curation and a compatibility check with Software Heritage’s tools.

Crawling

The SWH archive harvests source code from numerous sources and converts all the source code into a single and universal data structure, an enormous Merkle-directed acyclic graph [Merkle, 1987], a classical cryptographic construction that combines a tree and a hash function.

There are three phases:  Listing software sources, scheduling updates, and collecting the software artifacts into the archive.

Behind the scenes

Archiving all source code is a significant challenge, so we’ve implemented various mechanisms to ensure its preservation from various sources.

API

API access is over HTTPS.  All API endpoints are rooted at https://archive.softwareheritage.org/api/1/ and the data is sent and received as JSON by default.

You can jump directly to the  endpoint index , which lists all available API functionalities, or read on for more general information about the API.

For more information

Architecture

Archiving forge repositories and package manager source code are different processes, a complexity amplified by the evolution of version control. The SWH architecture harmonizes these varied sources into a robust system.

Data model

Software Heritage’s data model organizes collected information around «software artifacts,» with a hierarchy of contents, directories, revisions, and releases. Provenance is tracked using origins, visits, and snapshots. Read more in  Software Heritage: Why and How to Preserve Software Source Code.

Mirrors

SWH mirrors are full copies in sync with the Software Heritage universal source code archive, operated independently from the Software Heritage initiative. Mirrors improve software availability, prevent information loss, and ultimately ensure unfettered access to software source code for all, reducing the risk of data loss due to uncontrolled events.

For more information

Metadata

SWH collects and extracts metadata that describes and provides additional information on source code.

  • Extrinsic metadata are metadata which aren’t found in the software source code.
  • Intrinsic metadata are metadata included in the source code, in a specific file or as part of a source code file.

The metadata indexer docs

blog post

Indexing

The swh-indexer module is responsible for processing source code files to extract information with the following objectives:

  • mimetype
  • ctags
  • language
  • fossology-license (detecting the license of a file)
  • Intrinsic descriptive metadata which can be found in metadata files in the source code (e.g package.json, codemeta.json, pom.xml)