Close

January 10, 2019

Save code now with Software Heritage

woman in black off-shoulder dress

Just as search engines use Web crawlers to index the internet, the Software Heritage Archive uses its own specialized crawler to collect and preserve source code. Because there’s no universal way to gather code from hosting platforms like GitHub, GitLab, PyPI, and Debian, we had to build one ourselves. (Find a complete list of platforms – from BitBucket to SourceForge – on the Archive.) We regularly update these archives to ensure the most current versions of projects are always available.

Initially, we focused on partnering with major code repositories. However, we realized many individuals also wanted to contribute directly. That’s why we introduced the ability to push code to the archive. Researchers, for example, can now upload code directly related to their scientific papers, ensuring immediate archival.

Now, we’ve made it even easier:  Anyone can share the URL of a public version control system, and we’ll archive it for you (after a quick review). This means you control when your code is added to the archive, and you can even use it to accelerate the archival of a repository on a platform we already track.

We support the following origin types:

  • Git (git)
  • Mercurial (hg)
  • Subversion (svn)
  • CVS (cvs)
  • Bazaar (bzr)
  • Tarball (tarball): Supported formats include .jar, .tar, .tar.bz2, .tar.gz, .tar.lz, .tar.xz, and .tar.zst, .zip

Ready to archive your work? Visit archive.softwareheritage.org, then click “Save code now” in the left-hand menu or check out this short tutorial on YouTube. For developers, there’s also a dedicated API endpoint.

By using this feature, you’re not just archiving your own work; you’re helping us build the “Library of Alexandria of code,” preserving this invaluable part of our cultural heritage for generations to come.