Sharing the latest progress on NeSI’s platform refresh

It has been a very busy few months since we first announced our platform refresh. We’ve been working behind the scenes installing new equipment and collaborating with our vendor partners to configure the new environments and prepare the platforms for project and user migration in the coming months.

We’ve also begun onboarding some projects and users over to the new platform’s first service offerings (more on that below), and already seeing benefits realised through improved user experiences and easier-to-access resources.

What this new platform will deliver

Refreshed tools and technologies
  • More powerful CPUs, next-generation GPUs, enhanced storage capabilities, and new cloud-native development environments   
  • Integration of previous generation HPC infrastructure with latest generation to maintain capacity
Flexibility to meet future growth and evolving research community needs
  • Extensible architecture that we can expand upon and improve incrementally
  • A richer array of services that better respond to diverse computational and data management needs

Highlights of work completed & underway

Join us online for a webinar on 26 November that will share a progress update and Question & Answer (Q&A) session.

We’ll share an overview of the new tools and technologies coming online as part of this refresh, as well as a status update of current work in progress. Register here for the Zoom link and bring along any questions you may have.

We’ll share updates on work underway such as...

New infrastructure

As part of this refresh, we are consolidating our HPC and data storage infrastructure around our new Flexible High Performance Cloud (Flexible HPC) environment, housed in the Tāmaki Data Centre at Waipapa Taumata Rau University of Auckland.

All new hardware has been delivered and installation of new power and cooling capabilities in the Tāmaki Data Centre was completed in late October. This enabled us to progress cabling and networking, alongside the design and build of a significant physical and logical network reconfiguration, which will support the new infrastructure and platform tenants into the future.

Pictures of compute clusters in the Tamaki Data Centre in Auckland.
Attribution: 
A view from inside the Tāmaki Data Centre at Waipapa Taumata Rau University of Auckland.

 

We’re bringing online more powerful CPUs, next-generation GPUs, enhanced storage capabilities, and new cloud-native development environments.

Our new CPUs – AMD’s 4th Generation “Genoa” nodes – are a step-change more powerful and efficient than Mahuika’s Milan nodes. We’ll be able to better support demanding general purpose workloads as well as specialised uses integrating machine learning and data analysis approaches.

We’re also expanding our GPU capabilities, bringing on new NVIDIA H100 nodes to improve performance and scaleability for deep learning applications and large language models.

These will complement our existing A100s and P100s, and further support of our strategy to deliver next-generation cloud, data and AI solutions that empower Aotearoa's research sector. Through our wider diversity of offerings, you can get what you need to advance your research.

A few tech specs of what’s coming…

CPU compute:

  • AMD Genoa 9634 84-core 2.25-3.7GHz with DDR5
  • 8 large memory nodes with 1.5TB each
  • AMD Milan
  • Intel Broadwell

GPU and AI nodes:

  • Latest NVIDIA H100 NVL GPU nodes, 192GB HBM3 / node (2x 96GB)
  • Specialist NVIDIA L4 nodes GPU (Machine Learning inference)
  • NVIDIA HGX A100s, plus PCIe A100s and P100s

 

New storage solutions

As part of this refresh, our storage services are evolving. Our new platform will address diverse data management needs across various research fields, and deliver additional easier-to-use interfaces to scalable long-term storage.

By collaborating with WEKA, Xenon, and Versity we now have a suite of solutions and hardware which can be mixed and matched to meet the ever-growing needs of the Aotearoa research sector and support research data throughout its lifecycle. Initially, NeSI will be implementing services that our existing users are familiar with, but with immediate performance and usability improvements.

This includes new High Performance Storage from WEKA, with automated tiering, ensuring researchers' most active data is easily accessible on our most performant storage, as a replacement for the current General Parallel Filesystem (GPFS). The design and development of the High Performance Storage is well underway and we expect our OnDemand early access users to be some of the first to benefit from it (see more on that below).

Our other initial storage offering on the new platform is a collaboration with Versity and a replacement for the current Nearline service. Called Freezer, it will initially offer a similar long-term tape-based solution storing a single copy of data. We have begun copying project data over to Freezer and will look at the roadmap for this service once we’ve completed migration.

 

New interactive computing environment

Alongside the work above related to storage, we’ve also begun migrating NeSI users to our new interactive computing environment – OnDemand. Built on the open source Open OnDemand product from Ohio Supercomputing Center, OnDemand is our new platform that enables researchers to access high performance computing via an interactive web or notebook interface. OnDemand ensures easy access to tools like Jupyter and RStudio. Additional apps (such as MatLab, Code Server, and Virtual Desktops) will be released for use over the coming months. 

So far, feedback from our first onboarded users has been positive and we’re looking forward to rolling out incremental improvements over the next few months. This initial release of OnDemand does not yet support GPUs and is not connected to the Slurm cluster or high-performance filesystem. 

Below is glimpse of the new OnDemand environment in action:

 

Project and user migration

Important groundwork is underway to finalise behind-the-scenes but essential capabilities. This includes identity and access management, security assurance, platform observability, and account provisioning. We’re preparing for a supported migration of all users and all data in the new year. Overall, we are prioritising consistent user experiences and aiming to minimise downtime.

How you can prepare for migration:

  • Tidying up your data prior to migrating helps us speed up and minimises delays. If there is any data in your /home, /project, or /nobackup directories that can be deleted, please do so in the coming weeks.
  • Please keep an eye out for our emails as this is how we will contact you about migration (and therefore when you need to take action or do something differently).
  • If you’d like to be notified of upcoming early access opportunities and be among the first users onboarded to the new platforms, get in touch.

 

Looking Ahead

We are excited about the opportunities our new HPC platform will bring to NeSI and the broader eResearch community. 

As mentioned above, join our webinar on 26 November for more details or to ask questions.

In the meantime, you can reach out to support@nesi.org.nz any time or pop into one of our weekly Online Office Hours to chat with a member of our Support Team.

 

Topic: