décembre 18, 2025

Software rot: Saving science’s digital legacy

Modern research, from fundamental biology to the social sciences, runs on software. It’s the invisible engine powering discovery in virtually every field.

Data from the French open science monitor reveals a staggering reality: in fundamental biology, researchers use software even more than they do in computer science, and its use is massive even in the humanities for tasks like statistical analysis. Software has become a fundamental layer for all of modern science. If you remove it, you have no research left.

But have you ever stopped to think about what this software really is, who it’s for, and how fragile it might be? The code that underpins today’s scientific breakthroughs isn’t just a set of disposable tools. It’s a complex, human-centric form of knowledge that is facing a quiet crisis.

This is one of the takeaways from a recent webinar hosted by the Circle U. European University Alliance with Mickaël Menager, Vice President of Open Science Université Cité, Roberto Di Cosmo, Director of Software Heritage and Laurent Gatto, who heads the Computational Biology and Bioinformatics Unit at the de Duve Institute.

Software Is written for people, not just machines

The first thing to understand is a counter-intuitive truth: source code is a form of human-to-human communication, not just a set of instructions for a computer. The real knowledge, the scientific intent, is embedded in the human-readable text that developers write.

A perfect historical example is the source code for the Apollo 11 landing module, Di Cosmo says. Written in the 1960s, the raw assembly code is a cryptic series of instructions for the machine, but alongside it, engineers wrote copious comments in plain English, explaining their logic and intent. Those comments, intended only for other humans, are what allow us to understand the system’s design over 50 years later. As computer scientist Donald Knuth famously stated: « Programming is the art of telling another human being what we want the computer to do. »

This is critical because if you want to understand what a piece of research software truly does—to adapt it, reuse it, or build upon it—you need the source code. It’s the only way to understand exactly what the original designer had in mind.

Your research software is actively dying

According to computational biologist Gatto, we’re facing « the tragic death of open source research software. » This phenomenon, also known as « software collapse » or « software rot, » is the inevitable process where software stops working if it’s not actively maintained.

Imagine an early-career researcher, months of work culminating in a precious dataset. They find the perfect software from a paper, thinking their results are just an arm’s length away. But then the collapse begins. First, the link to the software is dead. They track it down elsewhere, but it won’t install on a modern computer. With heroic effort they get it installed, only to find it works on the old test data but crashes on theirs. Finally, they get it to run, but the results are nonsensical. The data is effectively useless, locked away by dead code.

This digital decay represents a catastrophic loss of knowledge, but this crisis has not gone unanswered. A global effort is underway to build a permanent safeguard.

Software Heritage has created a « universal library » for code—working to save science

To combat the crisis of software collapse, Software Heritage, a global, non-profit infrastructure called was founded in 2015. The mission is monumental: to « collect, preserve, and make easily available all the publicly available software source code on the planet. »

It’s the largest software Archive ever built, containing over 400 million projects and 26 billion unique source files. Acting as a universal safety net, it proactively archiving code from thousands of different origins. The value of this effort was powerfully demonstrated when an astrophysics researcher discovered that all his code had been deleted from the commercial platform Bitbucket. To his immense relief, he found that Software Heritage had already archived all of it, saving his work from oblivion.

While this ‘Library of Alexandria’ for code solves the problem of disappearance, it raises a deeper question: What is the ultimate goal of all this preservation? The answer goes beyond simple access.

The goal isn’t just reproducibility, it’s trust

« Reproducibility » is a major buzzword in science today. Making software open and available is a vital first step, but as Gatto points out, it’s not the ultimate goal. « The reason we want these things, » he states, « is because they lead to trust. »

Researchers need to be able to trust that the software we use works correctly and provides the right answer. A perfectly reproducible « black box » is not scientifically useful if we can’t understand how it works. Di Cosmo, Director of Software Heritage, illustrates this with a powerful analogy: when a judge issues a verdict, you want a clear, logical explanation based on law and fact, not a brain scan showing which neurons fired. You need human-understandable logic. This same principle is driving the growing demand for « explainable AI, » where the reasoning behind a model’s decision is just as important as the decision itself.

The role of AI in resurrecting lost science

While AI can create « black box » challenges, it also offers a revolutionary solution for preserving old software. Modern AI models are exceptionally good at understanding and translating code between different programming languages.

This presents what Di Cosmo calls a game-changer for science. You can now take a 10-year-old piece of source code, written in an obsolete language, feed it to an AI, and ask it to port it to a modern, usable language. « [with AI] you can convert it from Python to Rust from Pascal to something else, and it’s a new avenue we should explore. »

This is a task humans are notoriously bad at and dislike—tedious, low-level translation work, he points out. But it’s a task that modern AI, which excels at pattern recognition and instruction-following, can perform well. This makes a comprehensive archive like Software Heritage even more useful. It means the vast repository of legacy code is not just a digital museum; it’s a living library of knowledge that AI can help unlock.

Recognizing software as scientific output

The code running our world is more than just a temporary tool. It’s as an essential record of scientific knowledge—a critical component of the research history that must be preserved alongside the papers it supports.

As the speakers noted, this heritage is fragile and requires active preservation and understanding to keep it alive and useful.

It’s time to stop treating software as a disposable utility and recognize it as a permanent and invaluable scientific asset. As we generate more knowledge encoded in software than ever before, there’s a collective responsibility to ensure this legacy remains alive and trustworthy for the next generation of discoverers.

Catch the full 40-minute session on YouTube.

Software Heritage

Recognizing software as scientific output

Suivez nous