Data and metadata are the foundation of reproducible biomedical research. Wet lab researchers, funding bodies and research managers often struggle, however, to find robust solutions for implementing smart data stewardship — collecting not only bare-minimum results data but also valuable ancillary information, including experimental process data and measurements, commonly referred to as metadata. Collecting such a rich dataset to form a complete experimental record — one that includes the most minute and seemingly irrelevant physical conditions of a given experiment — poses a challenge for research groups around the world. Addressing this challenge by capturing all of this data is the key to ensuring reproducibility, which would improve the chance of research findings being translated into new treatments for patients around the world. The question remains: how do we make sure we collect experimental metadata in a comprehensive and accessible way?
Global science stakeholders are working hard to establish both standards and procedures enabling better metadata collection as well as easier access and re-use by scientists aiming to validate new findings. The most prominent example of such standards is the FAIR guidelines. This grass-roots initiative, proposed by a group of scientists in a Nature article, stipulates that experimental data have to be findable, accessible, interoperable, and reusable. Since the article was published, in 2016, the FAIR guidelines became an internationally accepted guidebook for increasing transparency and reproducibility in research. During the past four years, various policy-makers and scientific community stakeholders have been actively building awareness regarding metadata quality and FAIR data principles among scientists. One example for such an organisation is the UK Reproducibility Network. The UKRN is a national peer-led consortium investigating the factors that contribute to robust research, promoting training activities, and disseminating best practice.
How do we make sure we collect experimental metadata in a comprehensive and accessible way?
On the 22nd of October, alongside the UKRN, we co-hosted an online workshop From Data to Metadata: Ensuring reproducibility in biomedical research, focusing on the importance of and solutions for metadata collection in biomedicine. The workshop featured five talks covering different aspects of how to best support research reproducibility and which frameworks have been designed for experimental data collection so far. The event concluded with a panel discussion, where participants discussed their own experiences in implementing FAIR data practices and metadata capture.
The first talk FAIR: From Principles to Practices was given by Professor Susanna-Assunta Sansone from the University of Oxford. In her talk,Sussana discussed the importance and opportunities for science stemming from the implementation of FAIR principles for research. She argued that for experimental reproducibility to be achieved, the research community has to adopt a set of standards and policies to make data collection comprehensible and reusable for other researchers. As an example of activities in this space, her lab is running fairsharing.org, which is an online resource helping to establish standards for metadata collection.
In the second talk, Dr Philippe Rocca-Serra from the University of Oxford covered important aspects of data and metadata stewardship in biomedical research.The chain of discovery in biomedical research has many stakeholders who need to be able to communicate ideas and results via a facilitated knowledge exchange. This is particularly challenging in the biomedical context, where huge amounts of mixed quality data are generated. To make matters worse, the data generated often misses metadata records, which are critical to make experimental results findable, accessible, interoperable, and reusable. Implementing FAIR standards in this field will foster the shift into more reproducible research for both academia and industry. To help, Philippe and his group created The FAIR Cookbook. This manual takes a holistic approach to data governance, and is designed to enlist the help of different research team members, transitioning into a FAIR-compliant research process.
In the third talk, The Arctoris Approach: Automated Data Generation & Metadata Capture, Dr Martin-Immanuel Bittner, CEO and Co-Founderof Arctorisdiscussed how automation can aid data and metadata capture. Martin discussed the opportunities provided by implementing FAIR data principles and best data practices together with research task automation. From the moment an experiment is designed, unambiguous research protocols, automated data collection and comprehensive metadata capture enable experimental reproducibility and full compliance with FAIR data standards. The large data sets, with their rich annotation, collected via this approach can then be mined using AI/ ML methods, providing new insights based on a fully reproducible and auditable experimental pipeline. The premise of the Arctoris approach was also discussed in a recent IBI paper Ensuring Reproducibility in Biomedical Research — The Role of Data, Metadata, and Emerging Best Practices.
The second part of the workshop was dedicated to talks selected from submitted abstracts. Dr Kirsty Merrett (University of Bristol) and various event-participants shared their experiences on the importance of teaching young scientists about research integrity, underlining the challenging aspects of this task. This was followed by Louise Corti’s (University of Essex) presentation about the importance and challenges in validating research containing sensitive information. She discussed the role of safe havens in ensuring and safely storing sensitive information used in research, for example, patient data.
A common theme for all the talks and contributions from the following discussion was that responsible, thoughtful and transparent research is the foundation of reproducible science. Automated data collection, establishing standards, and open data and knowledge sharing will enable us to overcome the reproducibility crisis afflicting the biomedical sciences today.