10 de diciembre de 2025

Institut Pasteur: Advancing software recognition

Libraries advance teaching, research, and learning by providing resources, enabling discovery, and offering expert guidance. As software source code becomes increasingly central to contemporary scholarship, libraries must support researchers who work with it. In this series of interviews, professionals share their approach to research software.

As a data librarian at the Pasteur Institute’s Scientific Information Resources Center (CeRIS), Fanny Sébire guides scientists across the entire lifecycle of their research data, spanning everything from microscopy to clinical studies.

Beyond assisting with data management, reporting, and archiving, she plays a vital collaborative role: developing institutional IT tools and contributing to policy alongside other research support teams, while also coordinating the Open Science blog.

Bertrand Néron is a bioinformatics research software engineer at the Institut and a Software Heritage Ambassador. He focuses on providing essential IT solutions to research challenges, overseeing the entire lifecycle of these tools—from architectural design and development through to production. Néron is deeply interested in the crucial challenges surrounding research reproducibility.

In this interview, Sébire and Néron detail their collaborative efforts, explaining how their complementary skills are being used to change the institutional understanding of the role of source code and software.

Key takeaways:

Given the vast diversity of research software, expecting standardized development and management practices is unrealistic.
The perception of software must change: this is fundamentally a cultural shift, not a technical adjustment.
Training in proper archiving procedures should be mandatory for all code producers, including researchers, engineers, and students.

What foundational resources on software engineering do you suggest for a librarian?

Fanny Sébire: I’d recommend our blog, where we regularly discuss best practices in software development. Regarding open source code and software, I recommend the “Passport to Open Science” collection from the French Ministry of Higher Education and Research, which includes a booklet dedicated to software. And to learn about the software repository in HAL, the DORANum guide is very clear.

As an information specialist, what has surprised you most about working with research software?

Fanny Sébire: When I started working on research data, I quickly realized that practices varied greatly among researchers: each research team had its own way of organizing, naming, describing, storing, and disseminating data. I thought it would be different for research software: I believed that engineers would have more standardized practices, particularly by using common processes or tools. But I realized that this wasn’t really the case. For example, the use of GitHub or GitLab is not yet standard practice at the Institute.

How does your organization help researchers get and use the software they need?

Fanny Sébire: In May 2021, the Institut Pasteur published its institutional policy setting out guidelines for the management and sharing of data and software codes. It summarizes best practices to be implemented throughout the research process and refers to fact sheets that help scientists take action.

These are recommendations, not obligations:

Develop a Software Management Plan (SMP) to guide the reflection and planning stages of all development and management activities.
Submit the software created at the Institut Pasteur to the software forge provided by the institution
Assign an open license to any software that can be publicly shared
Actively promote the software to the scientific community by listing it in the bio.tools registry and on the research.pasteur.fr website
Archive all public source code in Software Heritage and deposit it in the HAL-Pasteur portal to ensure proper citation and increase visibility.

Bertrand Néron: The institutional policy and recommendations support open and reproducible science. This requires making all eligible research code publicly available, systematically assigning a distribution and use license, and archiving the code in Software Heritage to ensure the traceability and reproducibility of associated publications.

Since 2021, the Institut Pasteur’s open access charter has made mandatory the deposit of publications in HAL. However, the software deposit in HAL remains unknown, and many researchers are unfamiliar with Software Heritage. I’m currently collaborating with CeRIS to ensure that research code receives the same level of attention and support as publications.

What is the library’s strategy for managing research data and related source code?

Fanny Sébire: I collaborate closely with the Institut Pasteur’s data management platform team. We use Data Management Plans (DMPs) as essential tools to prompt scientists to address crucial planning questions as early as possible. My role involves reviewing these DMPs, which leads me to guide researchers toward appropriate methodological approaches, tools, and institutional contacts. For instance, researchers with questions regarding GitLab usage or software licensing can be directly referred to Bertrand Néron.

We also intend to leverage Data Management Plans (DMPs) to encourage researchers to archive their data and source code when their projects end. We’re currently collaborating with the Archives and Information Systems Departments to develop a solution for the electronic archiving of data, code, and closed software. Public code, conversely, will be archived on Software Heritage. Our main objective is to ensure that all archived data and software are thoroughly documented and contextualized to guarantee their long-term intelligibility and usability.

In addition, my colleagues at the library who are responsible for moderation in HAL-Pasteur would like to raise awareness about software repositories and the link with Software Heritage by:

Organizing workshops and training sessions where participants can actively submit and archive their code.
Identifying publications and preprints submitted to HAL-Pasteur that include a code repository link, then proactively contact the authors to offer assistance with code archiving and submission to HAL.

*Bertrand Néron, software engineer, Institut Pasteur*

Bertrand Néron: I believe we currently have sufficient technical solutions available for effectively managing research code and software. The primary obstacle, however, is not technical, but cultural. Code and software are still often viewed not as scientific outputs in their own right, but merely as secondary artifacts.

The next steps involve two key areas: first, a cultural shift. Raise awareness among researchers about the importance of code and software in academic work, encouraging them to prioritize proper source code management. Concurrently, work with evaluation bodies to increase the value assigned to code/software production and promote software engineering best practices. Second, develop guides and offer training on practical solutions, such as the SWH-HAL integration, to help reduce the non-reproducibility of research results.

How does the library and the Software Heritage Ambassador work together?

Fanny Sébire: My collaboration with Bertrand [Néron] began during a data meeting co-organized in 2020 with Anne-Caroline Delétoille, head of the data management platform. The meeting focused on the management of research code and software, and we invited Bertrand to present on software development best practices. Since then, we’ve maintained a regular exchange and advisory relationship. For instance, we’ve co-organized meetings with research unit managers to present recommendations for improving data and software management.

We plan to further develop certain topics in the coming months:

Software Management Plan (SMP) Implementation: Bertrand [Néron] recently completed his first SMP following training. We can leverage his hands-on experience to encourage other engineers and researchers to adopt and implement their own SMPs.
Source Code Archiving Procedures: We will promote and implement our dual archiving procedure—archiving public code in Software Heritage and closed code internally. Bertrand has already tested the internal procedure designed in collaboration with the Archives division for closed code.

Bertrand Néron: I’m a research engineer at Bioinformatics and Biostatistics Hub, a department of around 40 engineers that supports scientists with bioinformatics/biostatistics analysis and research application development. My personal focus is primarily on developing applications for the automatic annotation of bacterial genomes.

Within my unit, I’m a member of a working group tasked with rewriting our project management guides. We incorporate comprehensive recommendations for both analysis and development. For instance, the latest version includes new guidelines for drafting Software Management Plans (SMPs) and for archiving public code in Software Heritage—work that stems from our collaboration with the data management platform and with Fanny [Sébire] from CeRIS.

The goal is to offer Institut Pasteur staff simple, robust, and sustainable solutions, such as:

Archive all public source code in Software Heritage
Edit software metadata in HAL
Use the internal archiving solution to incorporate contextual elements (e.g., emails, calls for tenders) and cross-reference different materials belonging to the same project.

This approach allows us to avoid duplicating existing information and leverages the power of Software Heritage to address reproducibility challenges. Furthermore, publishing software descriptions in HAL enhances its discoverability. Through these efforts, we directly contribute to reproducible and open science, a key strategic priority for the Institut Pasteur. This new policy is currently awaiting approval. We hope to implement it quickly within the Bioinformatics and Biostatistics Hub and then roll it out to other units.

Where does Software Heritage integrate with—and add value to—your existing services?

Fanny Sébire: The obvious synergy is between our HAL-Pasteur portal and Software Heritage, which is greatly facilitated by the interconnection between the two. By opening up and preserving publications and source code, both tools contribute to more open and reproducible science.

I also believe that there could be synergy between Software Heritage and our Archives division, as both entities are dedicated to preserving scientific heritage. I think it is important to work together: using the Software Heritage tool to preserve code and supporting scientists so that they can archive all documents related to this code internally. The response to the call for projects, the slideshow presenting the project, and the activity report are all elements that help to understand the context.

Bertrand Néron: In the life sciences, most research code is created to generate knowledge, often measured in terms of the publications that result from it. From this perspective, code is deemed as less important than articles because it is seen only as a means to an end, not as an end in itself. However, in order to move towards more reproducible research, it is important to change this perception. By joining the Software Heritage Ambassadors program, I benefit from first-hand information on available solutions. I also have access to educational resources to inform and train my colleagues on best practices for code management. I can also meet other ambassadors in related fields and discuss their actions, draw inspiration from them, or collaborate.

Finally, being an ambassador gives me legitimacy within my institution: my superiors identify me as a point of contact for raising awareness of the importance of the code. I explain the benefits for users and for the institution.

What are the biggest obstacles in getting researchers to archive their source code?

Fanny Sébire: In my opinion, the difficulties are twofold. First, too few researchers at the Institut Pasteur are familiar with Software Heritage. Second, there is a lack of understanding regarding the necessity of formal archiving, as many believe that existing code repositories are sufficient. Consequently, they underestimate the associated risks.

For the library, it’s important to encourage software authors to enrich the source code description by completing a codemeta.json file and submitting the software to HAL. This is where we face the greatest difficulty: researchers frequently perceive this essential step as an unnecessarily time-consuming administrative task.

Bertrand Néron: One significant issue is the low awareness of Software Heritage. Researchers and students commonly treat software forges as if they were permanent archiving solutions. They consistently underestimate the potential risks associated with code hosting, particularly when managed by a private entity. Recent decisions by the US government, however, have clearly highlighted the urgent challenges involved in archiving research data and code.

Furthermore, a structural obstacle persists: journals fail to adequately recognize the specific characteristics of software. For instance, reviewers commonly request a DOI for source code instead of an SWHID. Even when authors provide SWHIDs, it’s often too late to alter these deeply ingrained editorial habits if the journal’s existing workflow does not specifically accommodate them.

Finally, researchers do not perceive the added value of metadata. While the long-term return on investment is significant for everyone, completing a CodeMeta file is still perceived as a superfluous task. This is compounded by a misconception: some mistakenly believe that codemeta.json files are specific to archiving in Software Heritage. However, CodeMeta is an established standard. Including this intrinsic metadata file in a code repository is a general best practice, independent of the chosen archiving solution (e.g., Zenodo, Figshare).

What key partnerships are necessary to most effectively ensure the long-term sustainability of research software source code?

Fanny Sébire: I believe it would be highly beneficial to strengthen our collaboration with the archives departments of universities and research institutions. I wonder to what extent they are all familiar with Software Heritage, despite its strong potential to serve as advocates for the platform among scientists.

Furthermore, given the frequent co-publication of data and code, I believe that collaboration with data repositories would be highly beneficial. This partnership would not require a technical interconnection between Software Heritage and the existing data warehouses. Instead, data repositories could simply inform depositors that indicating an SWHID is preferable to a forge link, thus ensuring the permanence of the link between the data and the corresponding code.

Bertrand Néron: Anybody who produces source code—whether researchers, engineers, or students—should be trained in archiving in Software Heritage. The essential strategy is to use concrete examples drawn directly from their respective fields to demonstrate the benefits of archiving. These courses should be practically focused to highlight the ease of implementation. For example, training sessions on software submission to HAL are being organized in November 2025.

At the Institut Pasteur, students and postdocs are highly productive code contributors, but their high turnover rate means this expertise and code often leave the institution. The challenge is therefore to make training efforts sustainable over time, to take into account the regular turnover of the people to be supported.

Finally, it’s essential to raise awareness among career evaluation committees so that software is given greater consideration than just its related publications. I’m confident that granting code a more significant role in evaluations would motivate researchers, engineers, and students to prioritize its visibility, specifically by submitting it to HAL.

Identify software mentions in articles

Identifying publications and preprints that include a code repository link or a software mention is a challenge, yet a major need for better evaluating software impact. However, the highly heterogeneous forms these mentions can take make them tricky to monitor. The SoFAIR project will improve and semi-automate the process for identifying, describing, registering, and archiving research software, ensuring it has received a SoftWare Hash Identifier (SWHID). It will extend the capabilities of critical and widely used open scholarly infrastructures (CORE, Software Heritage, HAL) and tools (GROBID) operated by the consortium partners, delivering and deploying an effective solution for the management of the research software lifecycle.

Learn more about the SoFAIR project

Up next

Catch the next interview in our upcoming series with leading librarians.
Discover the previous interviews
Support Software Heritage, become a sponsor
Check out the resources on open science produced by Software Heritage
Implement software deposit in HAL:
- Guide for the end-users
- Guide for the moderators (FR)
- Software deposit tutorials series
- Train the trainer resources: train staff to become effective helpers
- To request training (moderation or trainer training), please contact the Center for Direct Scientific Communication:
Email: sebastien.mazzarese/@/ccsd.cnrs.fr
Advocate for software deposit among research management.

Read the Software Heritage Open Science blueprint
Di Cosmo, R., Granger, S., Hinsen, K., Jullien, N., Le Berre, D., Louvet, V., Maumet, C., Maurice, C., Monat, R., & Rougier, N. P. (2025). Stop treating code like an afterthought: Record, share and value it. Nature, 646(8084), 284–286. https://doi.org/10.1038/d41586-025-03196-0

#LibraryLeadership

Software Heritage