December 3, 2020

Bringing software to the limelight in the Research Data Alliance

“Sometimes, if you don’t have the software, you don’t have the data”

Christine Borgman, Paris, 2018

Supporting Open Science and reproducibility of research is part of our mission: we are are building the essential infrastructure to support archival and reference of source code, supporting the scholarly echosystem, we contribute to community efforts to describe research software with proper metadata, we released the first bibliographic style ever designed to cite software, and we have been contributing for years to raise awareness about the importance of software in general, and as a key ingredient for academic research, on a par with articles and data.

As part of this effort, Software Heritage is actively participating in the Research Data Alliance (RDA), where we introduced the first software session in the 9th plenary in 2017 in Barcelona where the focus group on Software Source code: Sharing, Preservation and Reproducibility was held.

Attending the Research Data Alliance plenary VP16 organized virtually from Costa Rica, we are delighted to see how, in just three years, software has become a hot subject, with six different sessions dedicated to it:

Software Source Code IG meeting where we presented the recently adopted RDA/FORCE11 output[1], for which a community review was held from July to September 2020.
FAIR software roadmap BOF meeting
FAIR for research software (FAIR4RS) WG meeting
Computational Notebooks BOF meeting
CURE-FAIR WG meeting
Research Data Management in Engineering: Data Provenance and Research Software in Engineering IG meeting

This raising interest is good news for the acnowledgement of the role of software in academia, and it is important to move away from the usual habit of handling software “just like data”.

Software Source Code Identification

The way that lies ahead is long, but we are steadily moving forward: we have recently announced the landmark document from the EOSC Scholarly Infrastructures for Research Software task force, and we are happy to announce the availability of the final output from the Source Code Identification Working Group, a joint Force11 and RDA effort spawned from discussions both on the RDA’s Software Source Code IG and FORCE11’s Software Citation Implementation WG, recognizing that software is a special kind of object, and that its identification needs to be specifically addressed taking into account the various existing identifier schemes for software. Indeed, after the work of the FORCE11 Software Citation WG introducing the Software Citation Principles[2], it was clear that unique identification, persistence and specificity are important for citation, but there is still a gap between the principles and the reality of the current state of the art of software identification.

The output[1] produced by the working group contains a detailed survey of different systems of identifiers for software, and of their usage in different use cases, in an harmonized way, and its final version integrates feedback obtained through an open community review process.

An important step was to recognize that software is a complex concept, so a scale for the granularity level of the digital artifacts involved was introduced, which helped in the specification and analysis of the use cases.

The output describes a large panorama of identifiers schemes, both extrinsic and intrinsic, and designates the identification targets they can cater to.

The findings are collected in a complete table that associates identifiers schemes to identification targets, and emphasizes the fact that there is no one identifier schema that works for all use cases.

Combining multiple identifiers to cover all the facets of software appears hence as a necessity, if we want to capture all the essential use cases: discoverability, access, persistence, reproducibility and reuse.

This output establishes the necessary groundwork on top of which a set of recommendations may be drafted.

The working group has completed its mandate but you can continue the conversation in the Software Source Code Interest Group. Join the discussion!

[1] Research Data Alliance/FORCE11 Software Source Code Identification WG, Allen, A., Bandrowski, A., Chan, P., Di Cosmo, R., Fenner, M., Garcia, L., Gruenpeter, M., Jones, C. M., Katz, D. S., Kunze, J., Schubotz, M. & Todorov, I. T. (2020). Use cases and identifier schemes for persistent software source code identification (V1.1). Research Data Alliance. https://doi.org/10.15497/RDA00053

[2] Smith, A. M., Katz, D. S., & Niemeyer, K. E. (2016). Software citation principles. PeerJ Computer Science, 2:e86. https://doi.org/10.7717/peerj-cs.86

Software Heritage

Software Source Code Identification

[2] Smith, A. M., Katz, D. S., & Niemeyer, K. E. (2016). Software citation principles. PeerJ Computer Science, 2:e86. https://doi.org/10.7717/peerj-cs.86

Follow us