September 25, 2025

CodeCommons annual review – 2025-09-22

CodeCommons celebrated its first anniversary by bringing the community together for a plenary event at Inria Paris on September 22, 2025. The day was dedicated to reviewing our progress, sharing a common vision, and diving into the technical work that’s bringing the project to life.

Shared vision and status update

The morning session set the stage with a look at the big picture and a comprehensive status report:

Introduction and vision: Roberto Di Cosmo kicked off the day, reinforcing the mission and future path of CodeCommons.
Status report: Benoit Chauvet and team representatives provided detailed updates on all ongoing activities, showing how the individual pieces are coming together.

Technical tracks and featured presentation

The afternoon split into parallel technical tracks, allowing for in-depth presentations from each task:

Technical track 1: Focused on the core plumbing: Infrastructure and the Unified Data Model.
Technical Track 2: Explored advanced topics like Code analysis, similarity, and AI preferences.
Guest Talk: The day concluded with a special presentation on the Unified Data Architecture at Netflix, delivered by Alexandre Bertails.

A meaningful milestone

Thanks to everyone who joined our annual review! Your engagement and thoughtful questions made the event lively and meaningful. It was fantastic to gather the community, share essential information, and reinforce our common vision. It’s truly encouraging to see the project taking shape and the pieces coming together. Check out all sessions below.

Morning session

Introduction and vision, Roberto Di Cosmo

Status report, Benoit Chauvet and team

DiverSE Team activities review (pre-recorded)

Project context metadata collection

Programming language identification

Download morning plenary slides

Technical track 1: Infrastructure and unified data model

Slides

– HPC Infrastructure (Simeon Carstens)
– SWH Fuse (Martin Kirchgessner)
– Storage Compression (Francesco Tosoni)
– Open Source Vulnerabilities (Valentin Lorentz)
– Collect project context data (Caroline Landry)
– Programming Language detection (Baptiste Mehat)
– License detection (Philippe Ombredanne)

Technical track 2: Code analysis, similarity, and AI preferences

Slides

– LLM baseline training (Djamé Seddah)
– Similarity detection (Gaël de Chalendar)
– Similarity detection (Leonardo Venuta)
– Plagiarism detection (Andrea Gurioli)
– Design patterns Detection (Yassine Abdeljalil)
– AI preferences (Thomas Aynaud)

Software Heritage

September 25, 2025

CodeCommons annual review – 2025-09-22

Shared vision and status update

Technical tracks and featured presentation

A meaningful milestone

Morning session

Introduction and vision, Roberto Di Cosmo

Status report, Benoit Chauvet and team

DiverSE Team activities review (pre-recorded)

Project context metadata collection

Programming language identification

Download morning plenary slides

Technical track 1: Infrastructure and unified data model

Slides

Technical track 2: Code analysis, similarity, and AI preferences

Slides

Afternoon plenary

Software Heritage

Shared vision and status update

Technical tracks and featured presentation

A meaningful milestone

Morning session

Introduction and vision, Roberto Di Cosmo

Status report, Benoit Chauvet and team

DiverSE Team activities review (pre-recorded)

Project context metadata collection

Programming language identification

Download morning plenary slides

Technical track 1: Infrastructure and unified data model

Slides

Technical track 2: Code analysis, similarity, and AI preferences

Slides

Afternoon plenary

Follow us