Software Heritage is now open for your source code submissions!
Search engines use something called a Web crawler to systematically “crawl,” or look through, the Web in order to create indexes that can be searched through. In order to populate the Software Heritage Archive we also use a crawler, but unfortunately, unlike for usual web site, there is no standard for collecting information from code hosting platforms, so we had to build a crawler ourselves, specifically for this task, that can be extended to collect data from different code hosting platforms. It already tracks major source code hosting sites, like GitHub, GitLab, PyPI, and Debian (the current list is online). Since we originally ingested these sites, adding them to the Archive, we have periodically been crawling them in order to have the latest source code available. This keeps the Archive up-to-date with the newest releases of projects.
The sites initially represented on Software Heritage were picked for several reasons: they contain large repositories of code and were interested in working with us. However, we recognized that there would be individuals also interested in participating directly in the archiving process. That’s why in September we made it possible to “push” to Software Heritage, allowing researchers to upload code associated to scientific papers themselves. Rather than waiting for Software Heritage to ingest and crawl their sites, those using open access platforms to deposit course code could push that source code to the Archive.
More recently, we expanded the pool of who can push to Software Heritage. Now, anyone can share the URL of a public version control system and we will archive it (well, after a bit of moderation). This allows you to add your code to the Archive when you want to. It can also be used to trigger a faster archival of a repository stored on a code hosting platform we already track. Currently this only works for repositories using Git, but support for Subversion and Mercurial will soon be added.
In order to push code to the Archive, go to archive.softwareheritage.org and click on “Save code now” in the menu on the left side of the page. For interested developers, there is also a dedicated API endpoint.
This new feature allows you to archive your own work and help us with our mission. You can now push whenever you want and help to build the Library of Alexandria of code, preserving this invaluable part of our cultural heritage.