Close

December 12, 2023

Viewpoints on software in research at the Gustave Eiffel University

Software Heritage, the persistent source-code archive: what is it and how to use it?

Interview with Céline Rousselot and Joenio Marques da Costa

On Nov. 13th, 2023, our ambassador Joenio Marques da Costa delivered a talk during a session of “Rendez-vous Data”, by “Data Univ Eiffel”, the data and software management cluster implemented at Gustave Eiffel University. This local cluster is part of the French national network of clusters dedicated to data management. This network is supported by the French Ministry for Higher Education and Research.
An interview with Céline Rousselot and Joenio Marques da Costa.

Hello Céline. Please, could you introduce yourself and explain what is your role?

My name is Céline Rousselot and I am a data librarian at the University Gustave Eiffel. I work at the “Open Science department” (in French “Diffusion des Savoirs et Ouverture à la Société”), which is part of the vice presidency for “Research and Innovation”.
I support research teams in managing and opening their datasets. I coordinate “DATA Univ Eiffel”, the data management cluster of the university.
In 2022, DATA Univ Eiffel integrates the French national network of clusters dedicated to data management “Recherche Data Gouv”. It provides training and support in data and software management, in order to raise awareness and to disseminate best practices to the research team. It brings together expertise from various departments and vice-presidencies: IT department, legal department, research and innovation vice-presidency, partnership and professionalization vice-presidency.

Hello Joenio. Please, could you tell us what is your role?

My name is Joenio Marques da Costa. At the LISIS lab (Laboratoire Interdisciplinaire Sciences Innovations Sociétés), I’m a research software engineer and backend developer. In other terms, I write research software to support the work of the researchers in the laboratory. Currently, I’m involved in 2 software development projects created and maintained by my team:
(1) CorTexT Manager (www.cortext.net) – A data analysis platform for citizens and researchers in the social sciences and humanities – https://docs.cortext.net, and
(2) Risis Core Facility (RCF) – A software web platform for the European Research Infrastructure for Science, Technology and Innovation Policy Studies (RISIS) – https://docs.risis.io

Joenio, why did you become a Software Heritage ambassador?

I became an ambassador because I’ve been following Software Heritage since its conception phase: I studied the topic of research software sustainability, visibility, preservation and recognition during my master’s thesis, published at the end of 2017: On The Sustainability of Academic Software: the Case of Static Analysis Tools.
So, when a friend from the LISIS shared with me the open call to become an ambassador, I decided to apply because I was already advocating in favour of Software Heritage within the lab.

Joenio, as an ambassador, which type of support can you provide?

As an ambassador, I can present the Software Heritage library, explaining its main goals. Thus, I can deliver live and practical demonstrations of the library’s features. I like to teach! As I’m also a certified The Carpentries instructor, I can apply my instructor skills to run workshops to discover how to teach good practices about software preservation, software development, software citation, open science, Free Software and how those topics can contribute to the whole scientific endeavor.

Besides that, I can contribute by writing documentation and discovering opportunities to link Software Heritage community with other communities to which I belong, such as The Carpentries, the Debian project or the Live Coding Community (i.e.: Live Coding is a Art and Technology community making art with source code), among others.

The “DATA Univ Eiffel” and the workshop series dedicated to data management

Céline, what is the target audience of an “Atelier de la donnée”?

The target audience of “DATA Univ Eiffel” is research teams: researchers, engineers and doctoral candidates. We noticed that some support staff members (i.e.: institutional and commercial relations department, project assistance, etc.) also attended our webinars and training courses.

Céline, what motivated you to introduce the topic of research software in a workshop series dedicated to data management?

Since this year, on a regular basis, Data Univ Eiffel organizes webinars called “RDV DATA”, in order to present tools and best practices for data management and dissemination. These webinars are dedicated to data as well as software. Indeed, the best practices for software are evolving and data and software management share many similarities.
As the University’s research teams develop and distribute open source software, we organize a specific “RDV DATA” on platforms for software development, sharing and preservation.

Software in research

Joenio, based upon for experience as a trainer and as research software engineer, what are the three current misconceptions about software in research? And how do you address them?

I would say that the 3 most usual misconceptions are:

(1) The belief that software source code would be useful only for execution. Some people think that if the code is not very well written, or poorly documented, it is not useful to publish it for other users. But software source code is an important artifact for the society as it contains knowledge that is important to be shared. (note: see “Open Source ensures code remains a part of culture” https://www.softwareheritage.org/2023/05/05/software-preservation/ )
The source code can be used in many other ways than execution: in the software engineering field for instance, people can extract source code metrics to get a better understanding about some internal aspects of a software project. It is possible to measure the size of a software project straight from reading the source code.

(2) Another common misconception is that if someone shares the source code of a research software project, then their peers and collegues would scrutinize its level of quality. Thus, some researchers who create research software believe that the source code they wrote is not good enough and they would feel reluctant to share it, thinking that the scientific community will judge it negatively.
In response, I would say that the community doesn’t evaluate the source code quality this way. From my experience, when people check the source code, they usually do this in order to improve it, because there is a genuine interest from them.

(3) A final common a priori is that some developers think that once they would widely open the source code, there will be tons of contributions that will require extra work for code review.

Joenio, what are the most frequent questions that people ask you when you deliver a talk or a training about research software?

I can’t say that one question systematically arises when I talk about research software, but a frequent concern is about archiving solutions: there are so many platforms for archiving digital artifacts, Zenodo, HAL, Figshare, Software Heritage, etc. What is the best place to archive software? Usually I recommend using Software Heritage to archive software source code.

Joenio, what would you say to an end-user who would tell you “There are already too many platforms. I’m getting lost. Why should I use Software Heritage in addition with all the other existing solutions?”

I would say that it’s understandable to get lost among so many platforms, so many concepts, new words, new subjects, etc.
But I need to remind that software source code preservation is important, and we must be sure that all the software source code publicly available is archived for the long term.
It’s important from many point of views: in research, it is crucial to open the door to replication studies.
Added to this, as many research software projects are funded with public money, then it’s fair to share it back with the whole society. Thus, the knowledge encapsulated in the software source code will be available for the next generations.
Said that, I would say that the Software Heritage archive is the only infrastructure commited to archive software source code in the long term perspective, in a transparent approach, and that’s why it’s worth spending time to better understand it.

Data steward role with research software

Joenio and Céline: suppose I just started as data steward in my university. Regarding research software, what should I scrutinize first?

Joenio: I would recommend first to take a look on the FAIR principles, to understand how important it is to make research data Findable, Accessible, Interoperable and Reusable (https://www.go-fair.org/fair-principles/). Then, as the second step I recommend to take a look on FAIR for Research Software (https://fair-software.eu) to understand how to properly manage software development in a open and transparent way. This can be done by publicly sharing the source code on a platform like Gitlab or GitHub for instance. Or even better, by checking if your university provides a Gitlab instance, like the Univ Gustave Eiffel does: https://gitlab.univ-eiffel.fr/.
As a third step, I believe it would be important within the institution, to nurture the discussion on those topics by organizing workshops or training sessions on how to publish software source code, how to manage software development, and other good practices.
One good strategy would be to take advantage of existing networks. The framework provided by The Carpentries provide support to run workshops dedicated to computational skills, in a research context.

Céline: Newcomers working on software should get to know the institutional tools and platforms. Spotting where are the experts may also be relevant.
On our university intranet, we provide research teams with information on software development, share and use and legal aspects. In addition to the intranet, a page on the university website is dedicated to data and software.
Another piece of advice would be also to read the booklet “Open science – source code and software” written by the experts of the French committee for Open Science (CoSo).

Newcomers may also contact their peers with software expertise. As part of DATA Univ Eiffel, we are setting up a network of DATA ambassadors. Some of them, like Joenio, are software experts. The list of the ambassadors is also available on the University intranet.

Take home messages

Joenio and Céline: according to you, following the workshop delivered at Gustave Eiffel University, what would be the take home message that you’d like to share?

Céline: Attendance at the “RDV DATA” webinars is relatively high for our university. In my opinion, this shows that the webinar meets a need for information about research data and software. The format also seems adapted.
Workshops always take place during the lunch break, from 1pm to 2pm. People are generally more available at this time during which it’s also easier to get informal discussions and peer-to-peer feedback. Webinars are recorded so that the replay is available.
The aim is to share experience. I would like to invite anyone from the university or elsewhere to suggest topics for these “RDV DATA”.

Joenio: I would say that this kind of event is crucial and there are many rooms to replicate it in different contexts, with different user profiles, including different research fields. For instance, it would be nice to schedule webinars with Social Science people, or with Economists, Artists, and so on.
The “RDV DATA” webinar we’ve delivered in November 13th 2023 exceeded my expectations in terms of number of participants. And it was a very good surprise to see that at least 5 colleagues from my research unit attended too the session.

— Interview by Sabrina Granger

Useful resources

Slides by Joenio:
https://joenio.me/software-heritage-uge-data

Watch RDV DATA replay :
https://clap.univ-eiffel.fr/permalink/v12666c5f7d7cnb97dk8/iframe/

Examples of Carpentries lessons: How to use Version Control with Git ; R for Reproducible Scientific Analysis.

Nowogrodzki, A. (2019). How to support open-source software and stay sane. Nature, 571(7763), 133–134. https://doi.org/10.1038/d41586-019-02046-0

Phipps, S. (2023) “Open Source ensures code remains a part of culture” https://www.softwareheritage.org/2023/05/05/software-preservation/

Varoquaux, G. (2020, May 28). Technical discussions are hard; a few tips. Gael-Varoquaux.Infohttp://gael-varoquaux.info/programming/technical-discussions-are-hard-a-few-tips.html

Ask for an ambassador:
https://www.softwareheritage.org/ambassadors/

You would like to share your experience about software in research? Please, get in touch with us: https://www.softwareheritage.org/contact/

December 12, 2023