Parallel processing for ocean life
The Challenge:
Analyse and reconstruct 1.4 billion nucleotide fragments obtained from ocean sediment, to identify their species and gain a better understanding of how microbial life cycles nutrients in the ocean.The Solution:
Using the expertise of NeSI’s Computational Science Team and the parallel processing capabilities of NeSI’s Mahuika supercomputer to cut down on data processing times.The Outcome:
Weeks of sample data processing reduced to 48 hours. A better understanding of our oceans and an effective way to target single-species RNA.
Sediment that settles on the ocean floor teems with microbial life which plays a vital role in cycling nutrients in the water.
At the University of Waikato, Dr Alexis Marshall is researching the microbial ecology of marine sediment, analysing the its genetic makeup to identify the unique species that make the sediment their home. Her work is part of the Acidification Response of Marine Sediments project, which is funded by the Ministry of Business, Innovation and Employment.
To do this, Alexis used the RNA transcriptome reconstructor, Trinity. Each sample of sediment had its nucleic acids extracted, leaving a pool of RNA from all the species present. Trinity then reconstructed this massive RNA jigsaw puzzle, to allow Alexis to identify the species and their roles in the environment.
“We looked at how microbial communities function in coastal systems. We took a sample and sequenced it, which resulted in a large number of nucleic acid fragments. We used Trinity to take those base pair fragments and rebuild them to make each original piece of RNA in the sample,” she said. “When you do this type of sequencing, you’re not selecting for a species. In one sample you get all the bacteria, all the archaeans, all of the eukaryotes, all the diatoms, all of the jellyfish and muscles pieces breaking down in the sediment and all the viruses associated with all those domains.”
Her research focused on the single-helix RNA, rather than DNA, because of its comparatively short half-life. While DNA can survive for thousands of years, RNA can significantly degrade within minutes or hours. This makes it much better at giving researchers an understanding of marine sediment’s living ecology.
“You don’t know if that DNA comes from something currently living, has recently died or has been there for tens of thousands of years. DNA gives you a potential of what that community looks like. RNA degrades very rapidly in the environment, so when you’re looking at RNA it’s a snapshot of what’s just recently happened.”
Marine sediment is among the most biodiverse environments on Earth, so being able to separate and reconstruct a full picture of all the living microbes from this collective pool of RNA was a massive data challenge. To solve this problem, Alexis required NeSI’s high throughput supercomputer, Mahuika.
“When we first started doing this work, we were focused on capturing the most abundant species in sediment. Now, because we can get so much more data back, we study nutrient cycling species that aren’t as abundant,” she said. “We contacted NeSI because we were going from trying to assemble 100,000 individual 150 nucleotide base sequences, to trying to assemble 1.4 billion. We were having computational issues with memory, but also time.”
Mahuika’s ability to run parallel processing on several hundred CPU cores was key cutting down this runtime.
“When I contacted NeSI, my question was ‘Trinity has this ability to be broken up into smaller parts. Can we run it as lots of small jobs instead of one really big job?'” said Alexis. "They were able to work with us to make that happen. Now I can get an assembled data set to ask questions in 48 hours instead of three weeks."
Alexis also called upon Computational Science Team members Chris Scott and Dinindu Senanayake to help get the job done. Chris and Dinidu were able to configure Trinity to split the work into batches of smaller jobs that could be run in parallel across Mahuika, improving performance and efficiency with jobs finishing quicker and using fewer core hours.
Their method is documented in NeSI Support so other NeSI Trinity users can also take advantage: https://support.nesi.org.nz/hc/en-gb/articles/360000980375-Trinity
Alexis’ research is due to publish findings early next year. While the data gathered will primarily be used to inform marine ecology – for conservation, marine management and climate change impact predictions – the RNA sequencing techniques run on Trinity may have a range of carry-on applications for medicine and agriculture as well.
This includes potential for identifying single-species responses to medical treatments, to building maps of complex plant genomes. Nucleotide reconstruction in all these industries requires huge data processing capabilities and NeSI’s HPC platform and computational expertise is helping researchers meet this challenge.
For Alexis, that means discovering the bustling microbial world of marine sediment and how ocean systems cycle their nutrients. This kind of research not only helps to keep oceans healthy, it also enables us to better understand how oceans will cope with the changing climate.
---------------------
Do you have an example of how NeSI platforms or expertise have supported your work? We’re always looking for projects to feature as a case study. Get in touch by emailing support@nesi.org.nz.