Close

septiembre 24, 2025

GRNET establishes new Software Heritage mirror

the reflection of the sky in the glass of a building

Our digital world runs on source code, but keeping it all safe for the future is a massive challenge. That’s why every new, strong mirror for a project like Software Heritage is a huge step forward. Each one is a major commitment to the future of innovation and knowledge. 

The newest addition to this network is GRNET, which has just made a full copy of the Software Heritage universal source code archive available. GRNET is Greece’s National Infrastructures for Research and Technology, a leading public-sector technology company that has provided advanced network and IT services to academic and research institutions since 1998. It operates under the auspices of the Ministry of Digital Governance.

GRNET – has taken another bold step in digital preservation.  Recognizing that software is as vital to our cultural heritage as are books and scientific data,  GRNET is expanding its commitment to Open Science by contributing to Software Heritage by hosting a Software Heritage Mirror in partnership with National Institute for Research in Digital Science and Technology (Inria). Τhe starting point was the FAIRCORE4EOSC project through which GRNET successfully deployed a Software Heritage Mirror  — a complete, independent copy of the world’s largest universal source code archive on GRNET’s  datacentres.

— Κostas Koumantaros, Head of Strategy and Proposals

Unit of the European Infrastructures and Projects Directorate

This new mirror isn’t just about copying data; it’s about making our shared digital memory much stronger. The GRNET mirror, put together as part of the European FAIRCORE4EOSC project, is the newest member of the Software Heritage international network of mirrors. The first was launched in 2023 by ENEA, the Italian National Agency for New Technologies, Energy and Sustainable Economic Development.

By hosting and operating a dedicated Software Heritage Mirror, GRNET will ensure that Europe’s research and innovation community — and particularly EOSC — benefits from reliable access to this invaluable digital resource. GRNET continues to embody and strengthen its role as Greece’s gateway to global science and a driving force for Open, FAIR, and sustainable digital infrastructures.

— Themis Zamani, Head of Implementation

Unit of the European Infrastructures and Projects Directorate

Why this mirror matters

The GRNET mirror is a significant development for three primary reasons: preservation, accessibility, and collaboration.

  • Preservation: Having a copy of the archive located at GRNET adds a whole new layer of protection. This helps make sure this crucial collection sticks around for future generations. Losing foundational code would be a huge problem, so this mirror acts like a super-important digital backup.
  • Accessibility: Bringing the archive closer to users means things load faster and perform better. For researchers, developers, and organizations relying on Software Heritage, this translates to quicker access and a smoother experience. This is key for wider use and making the most of the archive.
  • Collaboration: Beyond the tech, the GRNET mirror is expected to boost engagement with local communities of researchers, developers, and institutions. It becomes a hub for innovation, a place for sharing knowledge, and a practical resource that can support local digital projects.

How big is this effort? Way more than just copying files

This is a massive undertaking. The Software Heritage Archive holds over 20 billion unique source files. To give you some perspective, imagine a digital library so huge it contains most of the publicly available code ever written. Storing such an enormous dataset requires serious infrastructure.

At GRNET, they’ve built a thoughtful architecture to handle this. The core services use an Apache Cassandra cluster for data management, currently holding 50 terabytes. For the front end and search functions, an NFS cluster provides a whopping 1.5 petabytes of storage. This supports services like Elasticsearch, which manages over 521 million indexes. It’s a substantial setup, also including PostgreSQL, RabbitMQ, and dedicated SWH Scheduler, worker, and search components.

The «replaying services»—the systems that reconstruct the history of code—are quite impressive, with 220 content replayers and 120 graph replayers. This complex system helps ensure that the relationships and versions between billions of files are accurately captured and easy to find. Plus, the whole operation, from setting it up to keeping it running, has been automated using Ansible.

The full synchronization—an «ingestion full synchronization»—took 12 months to bring in all 20 billion-plus unique source files. That just shows the dedication and technical skill needed to make such a huge resource available.

Designed for resilience: Providing a safe copy accessible over time

The GRNET mirror isn’t just a quiet storage facility; it’s accessible online, built to reinforce trust and ensure that the Software Heritage is built for the discovery of research software across Europe.. All that hard work has resulted in:

  • An advanced search engine: So you can easily navigate the huge archive.
  • An advanced query API (REST and GRPC): This opens up the archive for programmatic access and lets it integrate with other tools and workflows. Super useful for researchers and developers wanting to use the archive in new ways. You can check out some of these features at https://ui.swt.grnet.gr.

What’s next: Keeping it going and growing

Setting up a mirror like this demands both deep technical know-how and meeting legal requirements, including signing agreements and finding all the necessary resources. GRNET has stepped up to this commitment. Their approach involved a detailed pilot deployment to test the architecture and figure out resource needs, followed by the full production deployment.

For long-term sustainability, an agreement with France’s Inria ensures GRNET will support the mirror for three years, starting when the project was completed in May 2025. GRNET will operate and support the mirror using its own resources, while also exploring other funding options to ensure its long-term sustainability.

The effort highlights the collaborative and forward-thinking spirit needed to protect our digital heritage. This mirror is a great example of how to tackle preservation, accessibility, and collaboration in the open science world.

If you’d like to learn more about hosting a Software Heritage Mirror, check out the Mirror Network page.