Collecting, preserving, organizing, and sharing source code at the Software Heritage scale is a challenging undertaking. We are looking forward to collaborations with scientists interested in meeting their match!
To collect all publicly available source code and keep it current, we need to efficiently discover the many places where source code is made available over the Internet, ingest its content, and track changes.
We collect the development histories of source code from many different version control systems. They need to be captured in a generic and extensible data model that will need to stand the proof of time in the very long term.
To support scientific dissemination and reproducibility, the source code used in scientific articles needs to be collected in Software Heritage and properly cross referenced with all Open Access platforms.
If you want to help us tackling the scientific challenges posed by source code preservation at this scale, you may consider participating in Software Heritage research activities. You can start from the following resources:
We host several scientific working groups to discuss and act on the challenges faced by source code collection, long-term preservation, indexing, and more. We are going to start operate working groups incrementally over the next months, but several charters are already available.