Job opportunity: NIWA / NeSI seeking a HPC and Cloud DevOps Engineer
NIWA has an exciting opportunity for an experienced Systems Engineer to make a significant contribution to the New Zealand research landscape by working within NeSI.
A key component of NIWA's research infrastructure is the High Performance Computing Facility (HPCF), which is used to support research activities in climate, chemistry climate, weather, river flood, sea state, ocean, sea-level, and inundation modelling and forecasting, as well as an operational environmental forecasting system (EcoConnect).
Within NeSI the system also supports the High-Performance Computing (HPC) requirements of researchers within the national research community. The core HPC platforms have recently been refreshed with the deployment of a capacity cluster (HPE/Cray CS400 and CS500) and capability supercomputer (HPE/Cray XC50) in a tightly integrated design, sharing core high-performance filesystems, nearline storage, and related operational infrastructure.
You will support and further develop an HPC infrastructure and platform which delivers a wide range of services. You will help to ensure the delivery of a flexible and robust computational environment for scientific research whilst maintaining a stable platform for delivery of NIWA operations and NeSI's business systems. Our team is trusted to build the systems that run our newest cutting-edge platforms that power the New Zealand research community.
As part of a small, ambidextrous team, your role will at times necessitate working on all key components of NIWA/NeSI's HPC platforms, and as such there are significant professional development opportunities available for the right candidate.
We are looking for the following skills and experience
- A recognised relevant tertiary qualification, e.g., in computing and/or other sciences.
- At least five years' experience in the operation, and implementation of services based on Linux and/or Unix, preferably with some having been mission-critical services.
- Familiarity with Python.
- Understanding of container technologies and related management of Container-as-a-Service.
- Experience with Kubernetes
- Experience applying CI/CD tools and techniques to the management of infrastructure (Ansible, Jenkins, etc) and DevOps methodologies.
- Excellence in troubleshooting, root cause analysis and a cool head in the face of a technical problems and challenges.
In addition to the above, experience in some or all of the following, will be advantageous:
- HPC platforms, HPC interconnects/fabrics, IBM Spectrum Storage / GPFS, IBM Spectrum Protect, Slurm, Bright Cluster Manager, database administration, data management and system integration services.
- Management and deployment of OpenStack based infrastructure and services.
- Working knowledge of the build, management, and runtime of HPC software and related operating system libraries and services.
- Understanding of IT security concepts and willingness to implement flexible solutions that support them.
- Relevant version control system, scripting, and some knowledge of programming languages.
To view the full job posting or to apply online, visit the Science New Zealand Careers page. The deadline to apply is 01 November 2020.