Scaling up research data conversations

As part of a special panel at the Australasian eResearch Organisations (AeRO) 12th National eResearch Forum in Canberra, NeSI Director Nick Jones and colleagues from Research Data Culture Conversation (RDCC) discussed our collective research data challenges, the work of the RDCC, and what is next for characterising Aotearoa New Zealand and Australia's research data at scale.

RDCC have been exploring questions around Australian research data since 2021, building on earlier conversations nationally started in 2017 and sparked by a simple yet fundamental question:

What is the volume of unique data managed by Australian research sector institutions for the purpose of future access?

To answer with a 'Macro View', RDCC consulted more than 65 research institutions across universities, medical research institutes, and research infrastructures. Results of their interviews and survey suggested a research data volume in Australia of 300 Petabytes (PB) at the end of 2021 with a possible growth rate of doubling every three years.

Macro View insights

The RDCC team identified four key insights during the course of their work:

  1. The Yin and Yang of Data: There has been a long term focus on sharing, preservation, and re-use, while little attention has been paid to the resourcing, sensitivity and end-of-life of research data

  2. Research data management planning needs to evolve from a static upfront process to an active service oriented automated system

  3. Research data is itself dynamic and ambiguous, it doesn’t have a lifecycle so much as transition and transform through decision points

  4. The anticipated data tsunami hasn’t arrived, we’re only seeing Compound Annual Growth Rates of 30-40%

A screenshot of a slide outlining the four RDCC insights.
Attribution: 
Each panel of the image above represents the four key insights uncovered through RDCC's work.

 

Sharing lessons learned with Aotearoa

Fast forward to February 2023, when NeSI invited Ai-Lin Soo, Rhys Francis, and Luc Betbeder-Matibet from RDCC to host a workshop at eResearch NZ.

The session aimed to translate RDCC's approach and lessons learned in order to spark a similar initiative in Aotearoa New Zealand. We heard from AgResearch, Manaaki Whenua Landcare Research, and Waipapa Taumata Rau, University of Auckland on their respective journeys.

Participants then explored the shared problem space of the waiting data tsunami and the challenge of not knowing what volume of data currently exists within the national research sector and how much more could be on the horizon. Slides used for discussion points in the session can be viewed here.

As a nascent community, we are now preparing for the first Macro View estimate of research data volume for Aotearoa New Zealand. To allow direct comparison with our Australian counterparts, our consultation will ask the same key questions asked of the 65 institutions RDCC surveyed in 2021.

Alongside those questions, we'll also seek to understand the Indigenous Data Sovereignty of research data held by New Zealand research institutions, which adds a crucial Aotearoa perspective. This process will therefore need kōrero (conversation) with and guidance by Māori to refine how best to engage with this topic, with the aim to give a view of Māori data sovereignty and Te Tiriti fulfilment within research data held in New Zealand.  

Help us compile a Macro View of research data in Aotearoa New Zealand 

We are keen to include as many research institutes in Aotearoa New Zealand as possible in order to build an accurate Macro View of data up to 31 December 2022.

We will be using the following questions to do this:

PARTICIPATION QUESTIONS

 

Please provide your estimated:
 

  1. Volume of first copy content held future access in managed services
     

  2. Volume of data openly discoverable
     

  3. Volume of first copy content in services promoted as suitable for sensitive data – meeting ethics and privacy requirements
     

  4. Total storage volume consumed to support the above research data (including replications, etc.)

 

 

Definition of terms:

 
  • Volume: Terabytes (TB)

  • First copy content: For 1), 2), and 3), only count the first copy of the content as presented on managed services (do not count the backup or replication / redundancy copies)

  • Managed service: Service for which an institution provides stewardship. We acknowledge that data is held in services not managed at the organisational level.

  • Openly discoverable: Includes data which may not be openly accessible.

  • Total storage volume: First copy content as well as backups, redundancies and/or replications.

  • Content: Acknowledges that there is difficulty distinguishing between what is research data and what is other files within the system.

 

For further information and to get your organisation involved, please email Claire Rye, NeSI Product Manager - Data Services, claire.rye@nesi.org.nz.

 

 

Topic: