Plugging memory leaks in a hydrology code
The below case study shares some of the technical details and outcomes of the scientific and HPC-focused programming support provided to a research project through NeSI’s Consultancy Service.
This service supports projects across a range of domains, with an aim to lift researchers’ productivity, efficiency, and skills in research computing. If you are interested to learn more or apply for Consultancy support, visit our Consultancy Service page.
Research background
TopNet is a hydrology code developed at NIWA (the National Institute of Water and Atmospheric Research), which uses concepts of runoff generation controlled by subsurface water storage and topography.
It combines a water balance model within each sub-catchment to simulate water flow, soil moisture, lake levels, discharge in rivers and streams over time, taking into account precipitation, snow, evaporation and plant transpiration.
TopNet hillslope hydrological model conceptualisation (image from Bandaragoda et al, 2004).
Bandaragoda, C., D. Tarboton, and R. Woods, 2004. Application of TopNet in the distributed model intercomparison project, Journal of Hydrology, 298(1-4), 178-201. https://doi.org/10.1016/j.jhydrol.2004.03.038
Project challenges
The memory consumption of TopNet can grow up to 900GB for large catchments. When the model is run to analyse multiple decades over time, it can make the code numerically very expensive. NIWA Research Software Engineer Yinjing Lin reached out to NeSI for support in overcoming this challenge.
What was done
NeSI Research Software Engineers Alex Pletzer and Chris Scott worked with Yinjing to:
- Understand the memory requirements of the code
- Devise ways to reduce the memory footprint
- Apply other performance improvements to the code
The figure below shows the gradual increase of memory arising during the simulation.
Main outcomes
- A 20x reduction in memory requirements for a 100 year run.
- Before NeSI worked on the project, some simulations were requesting 100s of GB of memory. Such requirements are available on NeSI platforms; however, only a few nodes would have enough memory to accommodate such requests and thus the job would stay in the queue sometimes for a long time.
- At the end of the project, the memory footprint was reduced to a point where it has become easy to obtain enough computing resources. TopNet can now run on the default Mahuika partition without having to wait in a long queue in the hugemem partition, which just has 4 nodes.
- Up 40% performance improvement
Researcher feedback
"We are very grateful to the NeSI team for helping me locate and fix memory leak issues and a performance bottleneck in TopNet. Without these fixes, for commercial projects and existing research projects aiming to develop climate change impact assessments (including SSIF), we would have been unable to complete the work on time and on budget."
- Dr Yinjing Lin, Research Software Engineer, NIWA
Do you want to bring your research to the next level? We can help. Send an email to support@nesi.org.nz to learn more about our Consultancy support or visit our Consultancy Service page.