Software Heritage is now using GitLab!
We are delighted to announce that Software Heritage development and operations are now done on our own GitLab instance. It will improve our workflow and enable easier contributions from all interested in preserving software history. Thanks to the efforts of our sysadmin team, helped by Open Tech Strategies, we were able to smoothly transition from our previous system.
From Phabricator to GitLab
Software Heritage started in 2015 with only a couple of developers and basic archival features. As almost all software project nowadays, we had to pick a tool to coordinate the development, share the source code and improvements. Those are colloquially called “forges”. Back then, Phabricator was considered cutting-edge, so it was adopted for Software Heritage forge.
Over the years, Software Heritage has been improved by more and more contributors and team members. Its code and infrastructure grew larger and more complex as new services were added. Phabricator mostly served the project faithfully, but it was an unusual tool for casual contributors, opening the discussion about the opportunity to migrate to another tool, that would ease the process of contributing to Software Heritage. In parallel to the decision to migrate away, the company behind Phabricator announced the end of support in 2021, which sped things up towards an effective migration.
GitLab appeared as the most relevant solution. Its interface, features and the kind of development practices being encouraged (“Git-centered forks and merge requests”) are more aligned to what is now expected by most developers. And like Phabricator, GitLab Community Edition is free software. We can thus continue to host and maintain the development platform on our own infrastructure.
On January 6th 2023, our Phabricator forge has been “frozen”, and after a successful migration, the GitLab Software Heritage forge officially came into power on January 9th 2023.
This has been a long journey, in which we could rely on the strong cooperation of Open Tech Stragies (OTS) who created a powerful tool for forge migrations, known as Forgerie.
In the rest of this blog post, we would like to share our experience about this long and critical but exciting migration process.
- 2019: Decision to migrate away from Phabricator
- 2020: Choice of GitLab + first contact with OTS
- 2020-05: Start of the collaboration with OTS, migration process design iterations
- 2021-01: First Forgerie prototype of Phabricator/GitLab migration
- 2021-02 to 2022-05: iterative demos of Forgerie and feedbacks
- 2022-05: First delivery of the ready-to-use script
- 2022-05 to 2022-09: iterations on the SWH side, spawning of staging and production infrastructure
- 2022-09: GO for the migration
- 2022-10-15: Sysadmin projects migrated in production
- 2023-01-09: Rest of the forge migrated in production
The migration covers the following items of the forge:
- Task management
- Code repositories
- Code reviews
- Project management
- Pastes (code snippets)
Other information would be kept accessible for historical references via “frozen” copies of the old forge web pages.
For our Continuous Integration (CI), we chose to continue using Jenkins as is. We only had to update the repository URLs.
Phase 1: Collaboration with OTS/Forgerie
Open Tech Strategies had developed a tool called Forgerie to automate the migration of software projects from one forge to another.
The tool is generic, modeling data using an intermediate representation which can be using by many communities, whichever forges they might be migrating from or to. We chose to contract OTS to benefit from their expertise and improve what we viewed as a useful tool for every software project facing the same situation. An account of the experience can be read on their blog.
Phase 2: Forgerie script benchmarking and tuning
From May to September 2022, the Software Heritage sysadmin team led a series of iterative migration tests and extra developments on the Forgerie scripts, in order to fine-tune the migration process to what we expected our future GitLab workflows to be.
One of our requirements was to keep the possibility to link the original Phabricator tasks from migrated issues, and if possible to map Phabricator task IDs with GitLab issue IDs.
With Open Tech Strategies, we also had to decide how to map some Phabricator concepts into GitLab entities.
Here’s a list of the mappings we used
|Diff||Merge request||only the latest version of a Diff is migrated by Forgerie (focusing on the current WIP)|
|Projects||Labels||Some Phabricator projects attached to repositories were tagged as “Primary Projects”, and used to decide in which GitLab projects Issues were assigned|
|Comment (on Task or Diff)||Note|
|Inline review comment (with code suggestions)||Discussion|
Incremental migration capabilities
We needed to have the possibility of performing the migration in separate batches, so that we could migrate groups of projects at different stages.
Phase 3: First migration sprint on infrastructure projects
We initially focused on migrating infrastructure-related projects (deployment manifests, etc.), as a way to force ourselves to “eat our own dogfood”: our infrastructure team, which was at the helm of the GitLab migration, could start to use GitLab workflows for real while minimizing the impact on the rest of the team.
Phase 4: Complete migration in staging and team sprint
After the successful migration of infrastructure projects and workflows, we performed the complete migration process (as an increment on top of the live GitLab data) in a full staging GitLab instance. This allowed the rest of the team to start familiarizing themselves with GitLab, and to iterate on the final shape of the migrated data, as well as experiment on GitLab-based development workflows.
We focused this experimentation effort in a team-wide sprint in the last weeks of 2022, with the goal of doing the migration for real in the first week(-end) of 2023. During this sprint, we took several very structuring decisions:
Phabricator has a flat model and uses Projects as tags to organize all entities, while GitLab is organized in a hierarchy of groups and subgroups. We introduced a way for Forgerie to create a GitLab group hierarchy, and map Phabricator Repositories into this group hierarchy according to their Phabricator Project “tags” and other heuristics.
As the permissions models in Phabricator and GitLab are very far from one another, and our own permissions structure is quite simple, we’ve decided to manually handle GitLab group memberships and project permissions after the migration, leveraging our new nested group structure and simple scripts calling the GitLab API. Forgerie was only used to set the visibility of migrated repositories (public or private) and the confidentiality flag for issues, at the point of migration.
External contributions will happen through the usual fork-and-merge-request workflow: fork the repository to your own user namespace, push your changes to a branch, and ask for review on the main project through a merge request.
We’ve decided to use the same workflow for team members to keep our documentation consistent.
The updated contribution documentation will soon be available on https://docs.softwareheritage.org/devel/
In the final iterations, we realized that putting all Software Heritage engineering projects (infrastructure and development) under a main group would simplify the team workflow by leveraging shared issue labels and milestones. We thus modified Forgerie to create labels directly in the top-level group.
Phase 5: Final migration
Satisfied with the state of the staging instance, we turned Phabricator read-only and performed the migration to the production instance over the weekend of the 6th of January 2023. After fixing a one-character typo that prevented the migration of some tasks, the process completed in the evening of Sunday January 8th. GitLab was ready for our final post-migration steps on the 9th January, right on schedule.
The team met in the context of an all-hands in the next days and performed some post-migration issue triage operations, and updated all automations (CI and remaining deployments) to use GitLab instead of Phabricator.
All existing Phabricator tasks have been locked, with a link to the corresponding GitLab issue, and redirects have been put in place for git checkouts over https to minimize disruption of third-party workflows. Pushes over https and ssh to Phabricator repositories have been blocked.
- Set up more redirects away from Phabricator for features that exist in GitLab.
- Turn the remaining Phabricator data, particularly historical code reviews, into static pages (leveraging some work done by the Mercurial community).
- Improve CI integration. We are considering migrating to Gitlab CI due to some limitations in Jenkins.
The migration executed smoothly thanks to careful and long-running planning, quality contributions from Open Tech Strategies accross many iterations, and feedback from the whole team.
The comprehensive nature of the migrated data provided by Forgerie allowed to preserve all the history of issues and the contents of most code reviews. This allowed us to resume work with minimal disruption, at least considering the scale of the change.
We’ve kept our Phabricator instance in read-only mode, so that we could refer to it in case when we would need to access “historical” information, and we will turn it into static pages to be able to keep this information in the long term.
After a couple of weeks, it seems that we’ve rarely had to seek information directly in Phabricator, which confirms the successful and comprehensive coverage of the migration.
We’re confident that GitLab will improve the experience for our ever increasing number of contributors, in all aspects of the project, as it already has for our team.