What’s the link between marriage rules and high performance computing?
New Zealand eScience Infrastructure (NeSI)’s high performance computing resources are enabling a Massey researcher to solve complex biological problems. “Our challenge is to make a very fast program and run it in parallel,” says Elsa Guillot, a PhD student at the Institute of Molecular BioSciences in Massey University, Palmerston North. “Our simulations are optimised but even so, we need huge computational power and to run thousands of simulations. That’s where NeSI comes in.”
Guillot works in the Computational Biology Research Group which, as a result of a merger of two institutes last year, is now part of Institute of Fundamental Sciences. Her supervisor is Associate Professor Murray Cox of the Computational Biology Research Group, Inaugural Rutherford Fellow of the Royal Society of New Zealand, at Massey University. She has a background in data modelling, mathematics, statistics, computer programming, but also needs to understand how genetic data works, how it’s transmitted and evolves. Guillot’s current project combines history, humans and models and relates to hard-won anthropological data.
“We reconstruct the history of populations from DNA, using models we simulate in computers,” she says. “I’m interested in reconstructing the social behaviours of populations and how it can affect the genetics of those populations.”
The modelling of human behaviour in the interface between biology, statistics and computer science creates large genetic datasets that can only be successfully analysed using high performance computing – especially when researchers wish to include social behaviours in their results.
“If you don’t include social behaviour you only need to reconstruct the population’s ancestry, the family trees – as you go back in time you have fewer and fewer individuals,” says Guillot. “We need the whole population to see how they interact and how paternity differs, making the modelling much slower and more complex.”
Guillot’s simulations relate to specific social behaviour such as marriage rules that affect successive generations. “Marriage rules affect genetics because genetic information is transmitted from parents to children. Strong marriage rules can affect the genetics of an entire population.” This is particularly true of old, small and isolated populations, says Guillot, where a strong rule of marriage among a population’s ancestors can have dramatic genetic effects on its descendants.
“In human studies the data is complicated to collect,” Guillot explains. The Massey team works with anthropologists who collected electronic data over a 10-year period among remote populations in eastern Indonesia. “That’s mainly what we’re going to base our study on but we could eventually use other populations. We already have projects running on that data and until now my part in it has largely related to the theory.”
Anthropologists and historians are interested in the effects of strong social behaviour such as the application of marriage rules in global populations. Incest taboos and similar marriage rules among small populations have led geneticists to detect patterns in genetic data they believe may be linked to such rules, but no one has yet analysed them in detail, says Guillot. “We’re interested in quantifying this and, if the signals are strong enough, eventually we’d like to know if we can see the pattern of social behaviour emerging – reconstruct the past and see the history of this social behaviour.”
How HPC helps
Access to the NeSI high performance computers helps Guillot carry out her research to a standard that wasn’t previously possible, she says. NeSI is an e-research infrastructure serving researchers nationwide. Its supercomputers are based at the University of Auckland, the University of Canterbury and NIWA. NeSI is unique in New Zealand because of its diversity of computing architectures and high levels of technical support. For technical specifications, see /services/high-performance-computing-and-data-analytics/platforms.
In the past, genetic models such as those used by Guillot and the Massey team couldn’t be applied to actual populations. For such simulations it’s necessary to go back deep into the past, and preferably farther still, to find the small, isolated populations necessary for accurate analysis. “It’s very important we don’t concentrate on current systems, which is far too complex because populations are large and include massive migration,” says Guillot.
In some populations, men stay in villages and women move from village to village to marry the men – a system called patrilocality. But anthropologists think it’s possible that historically it was the other way round; men would move to their prospective wives’ village. “Maybe, we really don’t know yet, it’s possible to see that in the genes. That would give us an insight into the history of their genetics.”
Forward-in-time simulations
Until recently no one had employed so-called forward-in-time simulations to study human genetic patterns, which mimic populations undergoing evolutionary processes including mutation, natural selection and migration. In such simulations, time moves as it does in the real world; contributing considerably to the complexity of the modelling.
Guillot’s self-developed software for running such simulations is SMARTPOP: Simulating Mating Alliance as a Reproductive Tactic for POPulations. “It’s a software designed to look specifically at mating systems like marriage rules in populations,” she says. It was developed as an integral part of her PhD. For the past year or so Guillot has been developing and validating it. “You can develop something but then you need to prove, especially to biologists and anthropologists, that what you’re doing is meaningful and that’s it’s comparable to previous methods.”
The development and validation process has been tough, Guillot acknowledges, and one she says required much forethought. “Once you start developing the program it has a shape and a way of functioning that you can’t really change afterwards. You never spend enough time on it – but then, if you spend all your time thinking you wouldn’t get anything else done. My next paper will be about this software development and validation.”
Previously Guillot ran SMARTPOP on Massey’s relatively small servers. “Here I have access to a server with 16 cores – it’s comparable with using 16 PCs at once. But with NeSI it’s more like hundreds. With NeSI the possibilities are almost unlimited.”
Until now, the team hasn’t needed to run a large number of simulations, she says. “It’s really now that we need NeSI and we have it, so that’s great.” Guillot had an opportunity to meet the NeSI team at the New Zealand HPC Applications Workshop, held alongside the annual New Zealand eResearch Symposium. This allowed her to gather knowledge on parallel programming, she says: “I didn’t have much experience with parallel computing before and the NeSI team got me thinking about it a lot.”
She says the Massey team finds using NeSI relatively inexpensive. “There’s always the problem of how much money you can get to run your project,” she says. “NeSI has been very good for us. We looked at a few options – there aren’t that many available, to be honest, and they’re hard to find. We looked at Amazon Web Services but, for us, NeSI was definitely cheaper.”
Guillot says she’s also been impressed with the “fast and efficient” support she’s received from the NeSI team. “Whenever I’ve needed the NeSI team to install something or needed help to make things work, they’ve always been there. I think NeSI’s great.”
Future proof
In the future the team will be able to benchmark SMARTPOP’s performance on NeSI against that on Massey’s servers by comparing how fast its results become available. “A further measure of success is that we want to reuse this program as open access software, and how many people use this software internationally will be interesting – these kinds of genetics questions are being looked at by a few teams around the world.”
SMARTPOP will eventually be freely available. “We’ll improve it as much as we can, then people can make their own system with it. For that, open access and open source are necessary.” Publications are another goal, she says. “We’d like to publish this software within the year,” says Guillot. “Hopefully, for my PhD, I’ll publish another paper using this software, and the idea is that once this software is published we’ll try different theoretical systems and get as many publications as possible – and other people using this software and publishing, as well.” Although commercial applications for SMARTPOP are unlikely, Guillot says her mathematical and computational methods may be harnessed in unforeseen ways or inspire research by others.