“Software should be considered a legitimate and citable product of research.” — Software citation principles
“Non-reproducible single occurrences are of no significance to science »
— Karl Popper, The Logic of Scientific Discovery, 1934
“Sometimes, when you do not have the software, you do not have the data.”
— Christine Borgman, JNSO 2018
Guiding you all along the journey
Software Heritage allows to archive seamlessly your research software artifacts, and also to add to your research articles precise references to specific versions of the source code, down to fragments of individual source files. This allows to enhance significantly the experience of the reviewers of your work (for example, the artifact evaluation committees), and more generally all of your readers (and one of your readers could even be you in a few weeks, months, or years!).
Make sure your source code is hosted on a repository publicly accessible (Github, Bitbucket, a GitLab instance, an institutional software forge, etc.) using one of the version control systems supported by Software Heritage, currently Subversion, Mercurial and Git. Following well established best practices, include, at the top level of your source code tree, the following files:
Once your code repository has been properly prepared and up-to date, you need to:
That’s it, it’s all done! No need to create an account or to provide personal information of any kind. If the url you provided is correct, Software Heritage will archive your repository, with its full development history, shortly after. If your repository is hosted on one of the major forges we already know, this process will take just a few hours; if you point to a location we never saw before, it can take longer, as we will need to manually approve it. Notice that you can also request archival programmatically, using the dedicated Software Heritage API entry point.
Once your source code has been archived, there are many ways to reference it in your article. Three common use cases are
The full repository
The link to the full repository archived in Software Heritage (with all its development history) is obtained by prepending to the URL you used to request its archival the prefix https://archive.softwareheritage.org/browse/origin. For example, if the repository you have saved is https://github.com/rdicosmo/parmap, then the link to the saved version in Software Heritage will be
https://archive.softwareheritage.org/browse/origin/https://github.com/rdicosmo/parmap/
Following this link, your readers can browse the contents of your repository extensively, delving into its development history, and/or directory structure, down to each single file.
Specific version of the project
Software Heritage provides a fully documented standard identifier schema, called SWH-ID, for pointing to the precise version of all the source code it archives. For example, the following SWH-ID identifies a precise version of the source code of Parmap:
swh:1:rev:0064fbd0ad69de205ea6ec6999f3d3895e9442c2;origin=https://github.com/rdicosmo/parmap;
SWH-IDs can be turned into a clickable URL by prepending to them the prefix https://archive.softwareheritage.org/. So, the following (hyper)link brings you directly to a page in Software Heritage that is browsing that precise version (try it!)
https://archive.softwareheritage.org/swh:1:rev:0064fbd0ad69de205ea6ec6999f3d3895e9442c2;origin=https://github.com/rdicosmo/parmap
A very simple way of getting the right SWH-ID is to browse your archived code in Software Heritage, and navigate to the revision you are interested in. Click then on the permalinks vertical red tab that is present on all pages of the archive, and in the tab that opens up you select the revision identifier.
Version 1 of the SWH-IDs uses git-compatible hashes, so if you are using git as a version control system, you can create the right SWH-ID by just prepending swh:1:rev: to your commit hash.
Code fragment
SWH-IDs as supported by Software Heritage allow you to go even further and pinpoint a given fragment of code inside a specific version of a file, by using the lines= qualifier available for identifiers that point to files. For example, the following SWH-ID points to the core mapping algorithm inside the Parmap source code as presented in a research article describing Parmap back in 2012:
swh:1:cnt:d5214ff9562a1fe78db51944506ba48c20de3379;origin=https://github.com/rdicosmo/parmap;lines=101-143
Test it by clicking on this link: you will be brought seamlessly to the Software Heritage archive on a page showing the corresponding source code, with the relevant lines highlighted. Here too, you can get the exact link by navigating to the code fragment you are interested in the archive, click on the line number of the first line of the fragment, shift-click on the last one, and then open the permalinks tab
The latest version of Software Heritage’s research software guidelines and walkthrough with LaTeX examples, can be downloaded here.
The source code of the guide itself is available and distributed under a CC-BY 4.0 license. You are welcome to contribute to improve it. It will be updated regularly.
A mailing list for sharing information among the scientific community on research aspects of Software Heritage is available, and you are welcome to join.