Software Heritage: Terms of use for bulk access


The core mission of Software Heritage is to collect, preserve and share all software that is publicly available in source code form. Our ambition is to build a long term common infrastructure, to support a variety of applications in the area of cultural heritage, industry and research.

For this reason, Software Heritage is established as a non-profit initiative, preserving our software commons for the benefit of all.

Therefore, the following terms of use for accessing the archive contents are designed to be sufficiently permissive to comply with the approach and sense of our mission.

The few usage restrictions are aimed at ensuring proper operation of the core infrastructure, and safeguarding from abusive behaviour the people that through their work and dedication created the very software commons we are preserving.

More precisely, usage restrictions are justified by:

  1. technical reasons

The Software Heritage archive is a very big data set, comprising billions of individual files and tens of billions of relationships. The Archive access we provide is designed to allow large scale analysis of the Archive contents, with the hope that the results will contribute to improve the overall quality of the Software we all rely upon. Rate limitations may be put in place if needed to guarantee fair access to the Archive to all users. The Archive access is not intended as a means of making copies of the Archive contents for external use: if that is what you are looking for, consider establishing a mirror.

  1. ethical and legal reasons, involving in particular the protection of personal data.

The Software Heritage archive collects publicly available source code, and its development history, from a variety of public sources. Any personal information that may be contained in the source code or in the development history, will hence be collected in the archive. By accessing the content of the archive, you may get access to this personal information too.

We explicitly ask you to refrain from misusing any such personal information, or providing third parties a means to misuse it.

Mass mailing software developers is a well known example of misuse that is clearly unacceptable, but there may be many other ones. We may update these terms to explicitly forbid some of these misuses, but you should not consider that whatever is not explicitly forbidden is implicitly blessed: as a rule of thumb, if you have in mind a particular use of personal information which would bother you as a developer, that is a good indicator that you should abandon the idea there and then.

Extracting significant part of the Archive contents, and moving them outside the platform through which you are accessing them, will in no way exempt you from these obligations.

The precise terms and conditions follow. If you are entering this agreement on behalf of a company or other legal entity, you represent that you have the authority to bind such entity to the following terms and conditions. If you do not have such authority or if you do not agree with the following terms and conditions, you are not entitled to use the Software Heritage archive.

1. Definitions

1.1 Archive

The Software Heritage archive, referred to simply as “Archive” in the following, contains all the source code collected by Software Heritage, as well as all related information collected or produced by Software Heritage, like revisions, releases, file contents and project metadata, as well as additional factual information that may include license information.

2. Accessing the Archive

If you respect all the terms of the present agreement, you are granted access to the contents of the Archive, with the following provisions.

2.1 Quota and other technical limitations

Access quotas or other technical limitations may be put in place at any time to ensure that the load on the infrastructure is acceptable and that available resources are fairly shared among the different users. Any attempt to circumvent these quotas or other limitations is considered a breach of these terms of use, and you may be permanently banned from accessing the Archive as a consequence. If you need higher throughput, please consider establishing a Software Heritage mirror: this requires more work than just abiding to these terms, but you will get greater control while helping at the same time the global mission of Software Heritage .

2.2 No massive data extraction

In order to ensure that these terms of use are consistently applied, extracting significant parts of the contents of the Archive is not authorized.

3. Using the content of the archive

3.1 Personal data

Software in the Archive may contain personal information such as, but not limited to, developer names and email addresses.

Any systematic use of third party personal information that would cause unjustified prejudice to the very people that built the software commons that are preserved by the Archive is forbidden and will be considered a breach of this agreement. This includes, but is not limited to:

  • use of the information in a way that would violate people’s privacy or security, or to transmit and send any unsolicited requests, be them commercial material or not (or any other similar solicitation)

  • building developer profiles to be used without the developer’s consent

In addition, use of personal information may be subject to legal regulations that apply to you, independently of how you assembled it, and you are solely responsible of any violations that may result from your use of the information that you got from the Archive.

3.2 Metadata

The Archive contains a variety of metadata related to the source code, like provenance information and harvesting time, or computed information like file type, file length and programming language. Such metadata is considered factual, not covered by copyright, and may be freely used, as long as you comply with article 3.1 of the present terms and conditions.

3.3 No endorsement

You may not publicly represent or imply that we participated in, supported, or approved the way you use the content of the Archive, in particular when such a use is unlawful.

4. Rights and regulations

4.1 Third party’s rights

We do not expressly or tacitly warrant that the access to the archive contents does not infringe any third party’s intellectual property rights and we disclaim any and all liability.

4.2 Software licenses

All software components present in the Archive may be covered by copyright, or other rights like patents or trademarks. Software Heritage may provide automatically derived information on the software license(s) that may apply to a given software component, but it makes no claim of correctness and the licence information provided does not constitute legal advice. You are solely responsible for determining the license, or other rights that apply to any software component in the Archive, and you must abide by its terms.

4.3 Status of Results

Whether the results of the analysis and processing that you perform on the Archive contents are covered by these terms of use depends on the particular analysis or processing you actually perform. As stated in 2.2, massive data extraction is not allowed, but more generally, you must not make available to third parties results that would enable violations by them of the provision of these terms of services, like publishing a database of all the software developers’ emails extracted from the Archive.

5. No warranty, liability or damages

We disclaim to the full extent authorized by applicable law all warranty of any kind, express or implied, on the use of the Archive. In particular, we attract your attention to the following points.

5.1 Fitness for purpose

We do not give any warranty that the contents of the Archive are fit for any particular purpose.

5.2 Accuracy

We are making our best efforts to ensure that the contents stored in the Archive, be them source code or any kind of additional data or metadata, are preserved for the long term, traceable, and faithful.

However, should such information be incomplete or inaccurate, we shall not be held liable for it.

6. Third parties’ additional terms of use

6.1 Third Parties

In order to prevent information loss and make sharing easier, Software Heritage will progressively build a distributed infrastructure in collaboration with its partners and sponsors. Hence, you may access the Software Heritage Archive through infrastructures belonging to and/or operated by third parties, which may specify additional restrictions, and you should refer to their own terms of use in such cases.

7. Breach of terms of use

In the event of a breach of the terms and conditions stated herein, the present agreement will be terminated automatically. You shall no longer be authorized to access the Archive.

You may continue to use the material you extracted or derived from the Archive before the agreement’s termination, under the terms of each software component own license, provided you abide to the obligations set forth in Sections 3, 4 and 5 of these terms of use.

8. Applicable law / jurisdiction

The present terms and conditions shall be governed by the laws of the French Republic and interpreted accordingly. Any dispute relating to the application, interpretation, or termination of these terms and conditions shall be exclusively settled by the French courts.

9. Miscellaneous

We reserve the right, at our sole discretion, to amend these terms and conditions at any time and to change the characteristics (including technical characteristics) of the tools provided on the API at any time. We will notify these changes by posting a notice on our Website.

Terms and conditions may be assigned by us and will be binding upon and benefit to our assignee.

Any provision of the terms and conditions that may be declared invalid or illegal by a competent judge shall be without effect, but its invalidity shall not affect other provisions of the terms and conditions or affect the validity of the terms and conditions as a whole or in its legal effects.

If you have any question concerning the present terms and conditions, please contact