Laying essential groundwork for launching NeSI's new platforms

Since our last update, we've made exciting progress towards launching NeSI's new platforms at the Tamaki Data Centre.

We've run our first field test of our new compute capabilities, are continuing to copy data to our new storage systems, and gaining insights from early access users. 

Below is a quick update on some of those activities.

picture of back of a compute rack with wires exposed
Attribution: 

 

Foundational networking in place

As the backbone for communication between our new platform's compute nodes, storage resources, and other services, the networking setup is essential for fast and efficient data transfer, processing, and analysis. 

Working closely with colleagues at REANNZ, much of our work in the data centre lately has focused on installing, configuring, and testing our network infrastructure.

With that foundation in place, we've begun to configure other key components of our new compute and storage infrastructure. 

 

Road testing our new compute nodes

At the end of March we put our hardware setup to its first major challenge, a "burn-in" test.

The cluster ran at full capacity as a quality assurance stress test. We benchmarked and monitored the power and cooling requirements, as well as identify and address any issues with the network or operating system configurations. 

Our next milestone is to get our Support Team onto a test cluster, so they can begin setting up and testing the software and application environments. They'll also get a first crack at experimenting with our newest CPUs – AMD’s 4th Generation “Genoa” nodes – to benchmark and compare them to Mahuika's Milan nodes.

As mentioned in our previous update, these new nodes are a step-change more powerful and efficient, making them ideal for demanding general purpose workloads and specialised machine learning approaches. They're complemented by new GPU resources, NVIDIA H100 nodes, that improve performance and scalability for deep learning applications and large language models.

Much like our user community, the NeSI team is excited to get onto the cluster and see how these new resources will level up the ways NeSI can support research.

 

Data movement is well underway

In February we began copying project data from Mahuika to our new WEKA high performance file system. In March, an exciting milestone was reached – 1PB of data migrated!

All project data has been copied over to the new platforms, ready for when the compute resources become available. We're now working on copying nobackup data and we'll continue to sync project data so it remains up to date during this transition phase.

As a replacement for our General Parallel Filesystem (GPFS), the WEKA storage platform provides high performance storage for HPC & AI workloads and allows high security for sensitive data analysis. 

This screenshot from a REANNZ network traffic map shows NeSI project data moving from the GPFS in Greta Point to the WEKA platform at the Tamaki Data Centre.
Attribution: 
This screenshot from a REANNZ network traffic map shows NeSI project data moving from the GPFS in Greta Point to the WEKA platform at the Tamaki Data Centre.

 

It's a massive effort to copy this amount of data. You can help speed up the process by ensuring we're only moving data that's essential. If you missed our instructions for how to prepare your data, catch up here.

Once all project data has been copied over and the new storage platforms are officially brought online, we will be in touch to help you get started using them. If you missed our deeper dive into what's coming online, you can read an overview here.

This movement of "warm" (active) data complements the migration of "cold" (long-term) data we began prior to Christmas, from Nearline to our new platform Freezer. Powered by Versity, Freezer is a completely redesigned long-term storage service with an easier-to-use and scalable platform that we can extend and further customise over time. 

A Nearline service outage will begin on Monday 14 April so that we can finalise data migration and prepare to welcome researchers onto Freezer. We expect the outage to last about two weeks so if you have regular workflows that use Nearline or anticipate needing to read or write data to Nearline in April, get in touch with us. For more details about our transition from Nearline to Freezer, click here.

 

Pathway to the new platforms

We've made good progress over the last three months, but there are still key milestones to reach before we're ready for your first logins.

Below is a view of where we sit in our transition to the new platforms (click to enlarge):

We’ll keep you updated as specific timelines become clear.

Overall, as we move through these phases of work, we're prioritising consistent user experiences and aiming to minimise downtime. 

 

Growing use of OnDemand 

NeSI's new interactive computing environment, OnDemand, has a growing base of users.

Since our last update, we've migrated 10 projects from Jupyter on NeSI and supported activity from 12 different users. If you'd like to try using OnDemand, get in touch

We're continually improving the platform's stability and working with our early users to fix minor bugs and learn how to better support their workflows.

In the coming months, we'll connect our OnDemand service to the new platforms' Slurm cluster, high-performance filesystem, and GPUs. 

 

Pilot projects leveraging Research Developer Cloud

Our Research Developer Cloud is currently supporting pilot projects from Manaaki Whenua Landcare Research, GNS, the University of Otago, and the University of Auckland. All of these projects are using automation and DevOps approaches to improve their existing workflows and incorporate machine learning (ML) applications, pipeline development for ML, or back-end development of applications. 

To complement our existing cloud building blocks, we've launched a new Object Storage serviceIn the coming months, we plan to upgrade the Research Developer Cloud's operating system to Rocky Linux 9.3, which will improve overall performance and stability.  

To use our Research Developer Cloud, apply for access here.

 

We're here to support you

If you have any questions about the migration process or the incoming platforms, email support@nesi.org.nz  any time. You can also pop into our weekly Online Office Hours sessions to ask questions (big or small!) or work one-on-one with our Support Team to resolve any issues. 

We'll share another progress update before the end of May.

 

 

Topic: