A new era of software engineering, cybersecurity, & AI
A recent talk by Director Roberto Di Cosmo highlights how 10 years in, Software Heritage aims its ‘large telescope’ at the future of code.
A recent talk by Director Roberto Di Cosmo highlights how 10 years in, Software Heritage aims its ‘large telescope’ at the future of code.
CodeCommons aims to address these issues, making source code and metadata available in a single, accessible location. It will implement standardized data pipelines for cleaning and preprocessing, provide traceability through identifiers, and incorporate ethical considerations, such as attribution and similarity checks.
CodeCommons aims to provide a centralized repository of essential resources, including code, documentation, and metadata, to facilitate the creation of smaller, more effective datasets for the next generation of AI tools.
CodeCommons is a two-year project building on the Software Heritage archive. Here’s an overview of the projects we and our partners are working on.
CodeCommons, a two-year project funded by the French government, is building on Software Heritage—the world’s largest public source code archive—to create higher-quality datasets for responsible artificial intelligence.