Learning more about the volume and value of Aotearoa's research data
Over the last eight months, a growing coalition has been gleaning new insights about the data in Aotearoa New Zealand's research sector. Working with Research Data Culture Conversation (RDCC) colleagues from Australia, as a community we’ve been exploring pain points related to how well we manage research data. We’ve captured our insights in Aotearoa's first "Macro View" of research data.
Early insights
Following a kick off hui at eResearch NZ 2023 we worked as a sector to gather data on a common set of questions. In September we explored preliminary results of engagement with 14 New Zealand research institutions.
We estimated ~45 PB of research data was being stored across the sector as at December 2022. Imagine 20 times the content of all US academic research libraries1, nearly 4 million hours of HD video2, or two Wayback Machines3 (the Internet's digital archive) worth of data. The challenge that got us all around the table is a constant sense of a data tsunami, and a need to support rapid and relentless growth in how we store and manage research data, year upon year.
Insights from our first attempt at a Macro View:
- It is difficult to answer the questions with a high degree of accuracy
- Research data isn't owned by any one part of an institution. Coordination is critical across multiple teams / sources to get answers and trigger action
- Research Data Management (RDM) plans are often static documents that aren't being used over time, even within fairly mature research institutes
- Categorising data as "sensitive" is difficult without a common definition
- Reporting for specific science domains (other than weather / climate data) is not possible
- Māori Data Governance is a key aspect of research data in Aotearoa, and is currently difficult to measure
- Many organisations were in the process of data management audits or system migrations, so it was timely to be having these conversations.
- Shared problems can lead to positive and collaborative approaches to solving them. We've benefited from leveraging the experience and expertise of our Australian RDCC colleagues, and we're continuing to build knowledge in this space through participation in Research Data Alliance Working Groups.
Our preliminary results (as a presentation) are available to view here. We shared these results alongside RDCC's Australian Macro View as side by side posters at eResearch Australasia 2023 in October, winning a prize for our poster presentation. This follows an earlier award winning talk by the RDCC team at the annual THETA conference - watch a recording of the RDCC THETA talk.
From there we attended International Data Week 2023, joining the French Open Science Monitor and other groups talking about national approaches. None had attempted to quantify research data as we have, which again led to interesting discussions. The slides from our IDW talk are available to view here.
Provocative questions
Overall, this process has sparked novel reflections and discussions. Seemingly simple questions like "How much data is being stored? What kind of data is it? How accessible is it?" have been difficult to answer. Research data is considered a valuable asset, yet very little is known about the data we keep.
It begs further consideration about research data's value and the culture we have around managing it. It has also prompted some provocative questions, such as:
- If we're keeping huge volumes of data that we don't know a lot about, is it actually worth keeping?
- How much do we need to know about data before we delete it?
- How does data management work in reality vs. idealised approaches in RDM plans?
- Our Australian colleagues are predicting a possible data volume growth rate of doubling every three years. Is Aotearoa in a similar boat, with a data tsunami looming on the horizon?
Conversation starter or reality check? These two graphs were circulated on cards at eResearch Australasia 2023, prompting people to learn more about insights from the ARDC funded Institutional Underpinnings extension project.4
Join the conversation
We are grateful for everyone’s engagement in the Research Data Culture Conversation so far. We’re inviting anyone interested to join us at eResearch NZ 2024 (7-9 February 2024, Wellington) where we hope to share a 2023 Macro View for Aotearoa – see below for ways you can contribute.
We'll also be continuing reflections and discussion in this space through a variety of sessions:
- “Making research data count” and “Unravelling the data lifecycle” – NeSI is co-hosting two BoF sessions with Australian RDCC colleagues to further develop our shared understanding and explore possible actions from our national macro views. Roger Lins, Associate Director of Research Infrastructure at Waipapa Taumata Rau, University of Auckland, is also a co-author of the "Unravelling the data lifecycle" session.
- “Managing research data at scale: addressing the growing data challenge together” – David Jung of the University of New South Wales will be talking about the Institutional Underpinnings project, funded by the Australian Research Data Commons (ARDC). Very relevant to our kōrero as well.
Early bird registration rates for eResearch NZ end on 15 December.
A 2023 NZ Macro View
We are again asking for community support as we aim to collate a second snapshot of our research data holdings as at December 2023, in time for eResearch NZ 2024.
Participation Questions
Please provide your estimated Institutional:
- Volume of first copy content held for future access in managed services
- Volume of data openly discoverable through services provided by the institution
- Volume of first copy content in services promoted as suitable for sensitive data – meeting ethics and privacy requirements
- Total storage volume consumed to support the above research data (including replications, etc.)
- Volume of data under Indigenous governance
While the 2022 data gave us our first NZ Macro View, this second measure supports an estimate of research data growth too, and hopefully increases the accuracy and richness of insights gleaned.
Appreciating this can’t be confirmed until the new year, and there isn’t a huge amount of time between everyone getting back in January and the eResearch NZ conference in February, we’re open to preliminary responses based on a current state. We’re hoping this offers some useful flexibility so that your organisation can continue contributing to our macro view initiative.
For further information and to get your organisation involved, please email Claire Rye, NeSI Product Manager - Data Services.
1 https://mynasadata.larc.nasa.gov/basic-page/data-volume-units
2 https://www.globus.org/blog/100-petabytes-moved
3 https://www.lifewire.com/terabytes-gigabytes-amp-petabytes-how-big-are-they-4125169
4 Insights have been shared through two reports: D Jung et al., ‘Business Intelligence and Reporting of Research Data’, Zenodo (2023), https://doi.org/10.5281/zenodo.10076883 and D. Jung et al., ‘Retention and Disposal of Research Data: from current to best practices’, Zenodo (2023), https://doi.org/10.5281/zenodo.10076891