April 23, 2026

One Year of SWHID as the ISO/IEC 18670 International Standard

Abstract geometric structures against a blue sky.

For a decade, the Software Heritage project used a specialized syntax to track billions of code artifacts in its Archive. Over time, it became clear that the industry required more than an archival tool; it needed a universal method for verifying software integrity. This demand drove the technology to evolve into a platform-independent standard: the SoftWare Hash IDentifier (SWHID).

This shift from a project-specific tag to a universal SoftWare Hash IDentifier was cemented on April 23, 2025, when the schema was officially published as ISO/IEC 18670.

This ‘paper’ anniversary of ISO/IEC 18670 marks a shift toward broad implementation. While the formality of ISO certification can sometimes suggest a project has reached its final, static form, the SWHID ecosystem remains an active, community-driven effort.

orange and yellow paper flower — Photo by Di Sahu on Unsplash

The standard is currently supported by a range of reference implementations in languages like Rust, Python, and Go, alongside a public test suite designed to prevent “standard drift” across different platforms. We invite everyone—developers, educators, researchers, policy makers—to adopt it, build on it, and share it. From there, you can learn how to implement the standard or join the working group to help shape its future. Explore the spec on swhid.org or in the free public specification. From there, you can learn how to implement the standard or join the working group to help shape its future.

While the technical foundation is set, the real-world application is where the most valuable lessons are learned. Agustín Benito Bethencourt has spent this watershed year talking to dozens of industry pros about the SWHID. A Software Heritage Ambassador since 2023, he shares his lessons from recent presentations.

When you’re talking to a general audience, how do you make the case that we actually need a new way to identify software?

After a few months on the road talking about SWHID as well as writing about it, I’ve started noticing some patterns. I’ve distilled these into a few core lessons that I keep bringing back to the community.

The first is the need to raise awareness of how relevant software identifiers are — especially now, as the volume of software companies manage is growing rapidly. Artificial intelligence is boosting efficiency at an unprecedented rate, and that brings new challenges. Managing increasing amounts of software is one of them. Organizations that deal with very large amounts of software rarely do it directly. They do it through identifiers, which are simpler to handle.

Software Heritage’s Archive is one of the most relevant examples, but there are others at a smaller scale, such as osskb.org, the software detection service provided by the Software Transparency Foundation, another organization I collaborate with.

I often use the library analogy. When you look for a book in a library, you use a catalog where every item is assigned a local identifier. This ID doesn’t just name the book; it provides a map to its exact location. Without this system, a librarian would spend hours searching the stacks.

However, each book also has an ISBN—a permanent, universal ID. While the ISBN identifies the work itself, it doesn’t tell you where that specific copy is shelved or how it’s categorized locally. In a library, identity alone isn’t enough; you also need location, which is why the ISBN is insufficient on its own.

And those additional identifiers need to be standardized, so all libraries work in the same way. Otherwise, there would be little interoperability among them. Imagine walking into two libraries, each with a completely different catalogue. You would be lost. Librarians could not move between them. Common purchases would be a nightmare. Sharing books would be too.

Software is not conceptually very different from books. At some point, large or specialized organizations that manage large amounts of software need to define a catalogue. That brings us to the use of identifiers. Each organization, or each software management tool, can define its own identifier and its own catalogue. But then interoperability becomes nearly impossible.

Unlike a book, which is generally written by a very small number of authors, a coherent piece of software can be written by many developers, in many different languages, hosted on different platforms, managed in multiple ways, and integrated with different tools. It’s not just the tech giants; anyone in a modern supply chain is struggling to manage software that’s arriving from a hundred different directions at once. This is especially true in open source environments.

It’s easy to think these issues only affect a handful of specialists. But once software is everywhere, the struggle to manage it becomes universal. Today, we all feel the impact of not having a standardized way to identify software artefacts — from snippets or small blobs up to releases or complex, large snapshots. From software detection to vulnerability handling, from archiving to Software Bill of Materials (SBOM), from licence compliance to safety-critical systems development, from quality assurance to research reproducibility — the lack of a unified and standardized way of identifying software is limiting our ability to solve outstanding problems that the industry faces today. And it affects all of us, since software is everywhere.

No single organization, government, tool, or service can tackle this challenge alone. We need a standardized, worldwide solution — or a small number of solutions with extraordinary levels of compatibility.

What’s the main point of friction when you’re pitching this to tech executives and dev teams?

The ubiquity of Git, tighter security in package managers, the move toward machine-readable SBOMs, and new laws on software longevity are all converging. This is pushing more professionals to become familiar with different kinds of software identifiers.. As the amount of software they manage scales up, they start hitting limitations that come from not having a univocal and standardized way of identifying software.

In your talks, at what specific point does the concept finally click for the audience?

I partly answered this earlier, but the “Aha!” moment almost always comes with a demo.

One demo that always lands is showing how you can identify proprietary and open-source software exactly the same way. When people see an open source tool verify that both types of software are exactly what they claim to be, it clicks. They immediately see a solution to a problem they’ve been wrestling with inside their own organizations.

Another demo that audiences respond to well is one where you identify an open source snippet or a software release and confirm that it has not changed — using a combination of SWHID, a reference implementation tool of the standard, and the Software Heritage Archive. Then, as a second step, you confirm in isolation that a piece of proprietary software has not changed from what the supplier’s SBOM states, compared to the actual software delivered.

These demos highlight some of the strengths of SWHID in commercial environments that mix open source and proprietary software.

Software Heritage

When you’re talking to a general audience, how do you make the case that we actually need a new way to identify software?

What’s the main point of friction when you’re pitching this to tech executives and dev teams?

In your talks, at what specific point does the concept finally click for the audience?

Follow us