Building the next generation object storage for Software Heritage
The mission of Software Heritage is to collect, preserve and share all the publicly available source code. With over 11 billion source files from more than 160 million projects, the Software Heritage archive is the largest collection of source code ever created.
Building the Software Heritage infrastructure is challenging. This is why we partnered with funders around the world to provide grants for experts that are willing to engage with the long term mission of Software Heritage.
In early 2021 an innovative object storage was designed to allow faster (by an order of magnitude) bulk operations on the entire corpus, while incurring only marginal space expansion when storing small objects (4KB median size). Preliminary benchmarks were conducted during the summary 2021 to confirm the expected results.
Thanks to a grant from the NLNet Foundation, Easter-eggs will now address the next challenge: implement this novel approach in Software Heritage, with a first prototype running against real-world workloads.