We collect and preserve software in source code form, because software embodies our technical and scientific knowledge and humanity cannot afford the risk of losing it.
Software is a precious part of our cultural heritage. We curate and make accessible all the software we collect, because only by sharing it we can guarantee its preservation in the very long term.
You can search which software origins (repositories, source packages, etc.) we have already archived and when we have visited it, implementing a “wayback machine” for source code. Once an origin of interest has been identified, the web app allows to browse through it as you usually do with version control system browsing interfaces.
You can trigger instant archiving of any source code repository that is not yet ingested in the Software Heritage archive, or that is not up to date.
This complements the regular crawling of software origins that is performed on the main code hosting platforms, and gives you the possibility of ensuring that the code you are interested in is properly archived.
You can archive seamlessly your research software artifacts, and add to your research articles precise references to specific versions of the source code, down to fragments of individual source files. Just follow the link below for the guidelines.
You can contribute to rescue and curate landmark legacy source code while it is still possible to get hold of it, and talk to the people that created it. For this, you can follow the SWHAP process, developed in collaboration with UNESCO and the University of Pisa.
We harvest publicly available source code from many software projects and keep up with development happening there. As of today our archive already contains and keeps safe for you:
Programmatic access to the content of the archive is available via the Software Heritage API.
The API allows to navigate the archive as a graph of development-related objects, such as file contents, directories, commits, releases. With the API developers can lookup individual objects by their IDs, retrieve their metadata, and jump from one to another following links — e.g., from commits to the corresponding directories or parent commits, from releases to released commits, etc. The API also allows to retrieve crawling information, such as tracked software origins and the full list of visits performed on each of them. This allows, for instance, to know when snapshots of a specific Git repository where taken and, for each of them, where each branch was pointing at the time.
Software is so pervasive in our lives that its preservation concerns all of us. Our mission and the archive we are building will serve the needs of the many, from cultural institutions to scientists and industries.
Everyone can help us achieving these ambitious goals.
Software is an important part of human production. It is also a key enabler for salvaging our entire digital heritage.
We collect, preserve, and make accessible source code for the benefits of present and future generations.
Science relies more and more on software. To guarantee scientific reproducibility we need to preserve it.
Amassing source code at this scale will be challenging, but will also enable the next generation of software studies.
Software is present in all industrial processes and products.
The universal source code archive we are building will help industry with provenance tracking, long-term archival, and software bill of materials.