Protecting New Zealand's biodiversity through collaboration
Collaborators at ESR, Landcare Research and DOC have been working together to understand the health of New Zealand’s only terrestrial mammal, Mystacina tuberculata bats. This work could also help to address the longstanding question of whether coronaviruses have evolved recently or are much more ancient.
Richard Hall, Matthew Peacey, Nicole Moore and Jing Wang from ESR, Dan Tompkins and Dan White from Landcare Research and Kate McInnes from DOC were involved in this work. Richard and Jing made use of NeSI’s HPC platforms and the expertise of Sung Bae, from the Computational Science Team, to speed up their part of the project.
Richard and Jing worked with genetic sequencing data, attempting to identify any novel viruses in samples collected by the others in the team. They used NeSI’s HPC platforms to examine the results of genetic sequencing to and see whether there was any unique genetic material.
Richard explains why a powerful computing system is necessary for this kind of work, “Finding viruses is never easy. Unlike everything else, like humans, mammals, vertebrates or even bacteria, there is no common shared genes between two viruses. That means, in order to identify a virus, we need to compare our sequence against every other known sequence. That is a huge task.”
The research team uncovered a new Alphacoronavirus. This may prove to be an important result for the health of this special member of New Zealand’s ecosystem. As the bats are New Zealand’s only terrestrial mammal and are also quite geographically isolated, this finding may indicate that coronaviruses are indeed ancient, perhaps millions of years old - although much more research is needed to confirm this.
The research
The study involved collecting bat guano from four sites on Whenua hou (Codfish Island), off the west coast of Stewart Island. This material was then sequenced by NZ Genomics, Ltd. The results of that sequencing were then processed by the ESR staff.
Next generation sequencing produces vast quantities of data. ESR compared the bat dataset to other known all other nucleotide sequences in Genbank. Genbank is a worldwide database of genomic information. The tool used to carry this out is called the Basic Local Alignment Search Tool (BLAST) and is available at NeSI.
Scaling BLAST
BLAST is a common tool used within molecular biology. It enables researchers to compare nucleotide or protein sequences from multiple sources. Similarities between sequences can be used to shed light on gene function, possible new gene families and evolutionary history. BLAST+ is a new version that provides significant speed improvements.
To get the most out of BLAST+, it is important to split large queries into batches. This splitting can present challenges when transitioning to an HPC system, due to the complexities of job scheduling. NeSI staff member Sung Bae worked with Richard and Jing to implement a job submission workflow that enabled linear speedup up to several thousand batches at a time.
Concurrently, the team worked to understand whether compiler options could enable the software to run faster. By profiling, benchmarking and experimenting, Sung was able to provide a 2.7x speedup without touching a line of source code.
Collaborative Projects
NeSI’s Computational Science Team spends time working with research groups to understand their needs and then works intensively as a partner to remove roadblocks, develop algorithms or do whatever needs doing to speed research along. The team is available nationally to every research group that has a project with NeSI under the Research Allocation Class. In one case, sitting down with the team ended up in a 1,300-fold increase in speed per run.
Read More
- Hall RJ, Wang J, Peacey M, Moore NE, McInnes K, Tompkins DM. New alphacoronavirus in Mystacina tuberculata bats, New Zealand. Emerg Infect Dis [Internet]. 2014 Apr [date cited]. 10.3201/eid2004.131441