Paris-based postdoc on software provenance

  • Contract type: Public service fixed-term contract
  • Renewable contract: Yes
  • Level of qualifications required: PhD or equivalent
  • Position: Post-Doctoral Researcher
  • Supervisors: Stefano Zacchiroli, Roberto Di Cosmo


This postdoc position is open at Inria in the context of the Software Heritage project, on the topic of large-scale, big data graph analysis for the purpose of tracking the provenance of software source code artifacts, such as source code files and commits, as captured by state-of-the art distributed version control systems (VCSs).
Inria is a national research institute dedicated to digital sciences that promotes scientific excellence and transfer. Inria employs 2,400 collaborators organised in research project teams, usually in collaboration with its academic partners. This agility allows its scientists, from the best universities in the world, to meet the challenges of computer science and mathematics, either through multidisciplinarity or with industrial partners.
Software Heritage is a unique initiative to build the universal archive of software source code, catering for the needs of research, industry and society as a whole.


With the help of the Software Heritage team, the recruited person will work on the analysis of the Software Heritage graph dataset (see, the largest publicly available corpus of Free and Open Source Software (FOSS) development history; the dataset already contains 6 billion unique source code files and 1 billion unique commits, collected form more than 90 million FOSS projects.

Main activities

To this end, the recruited person will:

  • Develop novel techniques to analyze such an interconnected dataset. Techniques to be considered include: scale-out analysis, by the way of topological partitioning and distributed algorithms; and scale-up analysis by the means of graph compression techniques (e.g., from the web graph analysis community).
  • Develop compact and efficient representations of software source code provenance, i.e., where and when specific source code artifacts, such as source code files and VCS commits, have been observed across the entire dataset.
  • Quantitatively measure emerging properties of software source code provenance and characterize their evolution over the multi-decade time span covered by the Software Heritage archive.

The research activity will take place in a multidisciplinary team involving computer scientists, physicists, and industry leaders with the purpose of developing and deploying solutions for tracking software source code provenance at scale. It will involve of course a documentation of the work and results in the form of conference proceedings and journal papers, and the presentation of the results at scientific meetings.

The keys to success

The ideal candidate will have obtained a PhD degree within the last 3 years, in computer science, applied math, computational science, or a related technical field, and will have proven experience in:

  • implementation of graph algorithms and their applications to sizable real-world graphs.
  • at least one state-of-the-art, large-scale, distributed computing platform (e.g., Apache Spark, or equivalent offerings by major public cloud providers).

It is expected that the candidate will have proven consensus builder in a highly collaborative environment and excellent written and oral communication skills.

Will be considered a plus:

  • Experience with the analysis of the Web Graph (or similar naturally-occurring complex networks) and related compression/analysis techniques.
  • Experience with machine learning and big code analysis (
  • Technical participation in popular Free/Open Sources Software projects.

Benefits package

  • Subsidized meals
  • Partial reimbursement of public transport costs
  • Leave: 7 weeks of annual leave + 10 extra days off due to RTT (statutory reduction in working hours) + possibility of exceptional leave (sick children, moving home, etc.)
  • Flexible organization of working hours
  • Professional equipment available (videoconferencing, computer equipment, etc.)
  • Social, cultural and sports events and activities
  • Access to vocational training
  • Social security coverage

Instructions to apply

Applications must be submitted via email to The email must contain the subject line “postdoc provenance”. A complete application should include:

  • resume,
  • cover letter, and
  • links to any previous related work online.

All materials must be in a free format (such as plain text, PDF, or OpenDocument, and not Microsoft Word). Email submissions that do not follow these instructions will probably be overlooked.

Applications will be reviewed on a rolling basis until the position is filled.

Software Heritage is an equal opportunity employer and will not discriminate against any employee or application for employment on the basis of race, color, marital status, religion, age, sex, sexual orientation, national origin, handicap, or any other legally protected status. We value diversity in our workplace.