March 18, 2020

Software Heritage contributes CodeMeta generator to the community

Software projects are a precious part of our technical, scientific and organisational knowledge, and that’s why Software Heritage’s mission is to collect, preserve and share all their source code.

In order to make it easy to discover the software projects you may be interested in, among the tens of millions that we archive, it is important to have proper metadata that describes them. This is why we are spending significant energy to connect with several communities that work on software metadata.

Today, we are happy to share a significant contribution we are making to one of these communities, CodeMeta, by releasing to the public a new tool, the CodeMeta generator, that allows to easily create or edit metadata files conforming to the CodeMeta schema.

What is CodeMeta?

There are (too) many different metadata vocabularies out there: some are used for describing packages and their dependencies, other are used in Wikidata, others in various ontologies. The picture below presents just a little fraction of the complex landscape you are confronted with when looking for a way to describe a software project.


The CodeMeta project started an effort to address this issue, and led to the development of a sort of Rosetta stone for translating from one vocabulary to the other: the CodeMeta crosswalk table.

An important offspring of this effort is the creation of the CodeMeta vocabulary, as an extension of the SoftwareApplication and SoftwareSourceCode classes found in the vocabulary of the popular initiative. Metadata information conformant to the CodeMeta vocabulary can be represented in JSON format, typically named codemeta.json.

CodeMeta at Software Heritage

Here at Software Heritage, we decided early on to use the CodeMeta vocabulary internally when indexing intrinsic metadata, and joined the CodeMeta task force in FORCE11 that is now stewarding the inclusion of the CodeMeta vocabulary into the vocabulary.

We also joined a growing consensus in the academic community by recommending to add a codemeta.json file to all research software repositories in our guidelines for saving and referencing research software: indeed, embarking a codemeta.json in a software repository is a simple and effective way to make your metadata accessible in a machine readable form, hence easily indexable in Software Heritage.

More recently we adopted the CodeMeta representation, in the form of codemeta.json files, for describing landmark legacy software in the framework of SWHAP, the Software Heritage Acquisition Process, developed in collaboration with the University of Pisa and UNESCO .

The CodeMeta generator is here!

The need emerged clearly for a tool to easily create a new CodeMeta file or edit an existing one, and since no such tool was available, we decided to dedicate some of the Software Heritage engineering resources to develop it. A first prototype was showcased at the the FORCE2019 Hackathon a few months ago.

Today, we’re delighted to share the news that the repository of the CodeMeta generator has been moved under the CodeMeta umbrella, released under the AGPL licence, and you can collaborate to its evolution right now on its repository.

The tool is implemented as a serverless web application, so you can run the tool locally on your browser, or use the online version hosted on the CodeMeta website, with no fear of overloading it.

Try it, and we’re confident all your software projects will soon have a machine readable codemeta.json too!

March 18, 2020