February 28, 2024

Pioneering the Future of Code Preservation and AI with StarCoder2

Software Heritage’s mission is to collect, preserve, and make the entire body of software source code easily available, especially emphasizing Free and Open Source Software (FOSS) as a digital commons encapsulating decades of human ingenuity. Our journey is dedicated not only to safeguarding this invaluable resource for future generations but also to maximizing its utility to enhance science and software development for the benefit of all.

Challenges and opportunities

The advent of Large Language Models (LLMs) capable of generating code presents both a challenge and an opportunity. The challenge lies in navigating the complex legal and ethical landscapes governing these innovations. To address this, we unveiled our statement on Large Language Models for Code in October 2023, outlining our guiding principles: openness, transparency, and respect for the authors.

The opportunity arises from the potential to make the vast body of knowledge embedded in humankind’s source code more accessible and reusable for a much broader community: this aligns perfectly with our core mission.

A first milestone with StarCoder2

Today, we are thrilled to see a first realization of this opportunity with the introduction of StarCoder2, the first-ever AI model for code developed using the comprehensive source code repository of the Software Heritage archive, and fully aligned with our principles for LLMs for code.

We congratulate Hugging Face, ServiceNow, and NVIDIA for their collaborative efforts to reach this important milestone within the BigCode project, showing commitment to ethical AI development, and advancing technology for the greater good. StarCoder2 is a significant step towards a world where the barriers to software development are lowered, where innovation is fueled by ethically developed AI, and where the digital commons serve as a foundation for future breakthroughs, ensuring that the knowledge derived from decades of software development benefits humanity at large.

Forward Together

As we celebrate this milestone, we are reminded of the ongoing importance of our mission to collect, preserve, and share the wealth of human knowledge embedded in software.

We invite the global community to join us in this exciting new chapter.

Software Heritage is more than just a repository; it signifies a commitment to a future where every line of code adds to our shared legacy and collective advancement. Embark on this journey with us—because every line of code, and every step towards ethical AI, counts.

February 28, 2024