Open Research & FAIR Case Studies

On

Open research is the practice of making the processes and outputs of research transparent and freely accessible, whenever possible. In support of The University of Sheffield’s new Open Research Statement, the case studies below demonstrate how these principles are being put into practice. The first case studies are from the winners of the University Library’s inaugural Open Research Prize in Spring 2021. The winner was Tim Craggs of the Department of Chemistry, whose research group has developed a fully open-source smFRET instrument with applications spanning disciplines across biophysics, biology and biomedicine.

The case studies are also available in PDF format in the University of Sheffield data repository, ORDA: https://doi.org/10.15131/shef.data.c.5621626

Open Research Case Studies 

Tim Craggs - Developing a new open-source instrument

The Craggs Lab in the Department of Chemistry have developed a fully open-source instrument for Single-molecule Förster Resonance Energy Transfer (smFRET) measurements, a powerful technique with applications spanning many disciplines across biophysics, biology and biomedicine.

Despite the many advantages of smFRET, which allows scientists to make measurements on a molecule by molecule basis, it is not widely used outside specialist labs. This is largely due to the high costs of commercial instruments and lack of self-build alternatives. To address this, we published a paper in Nature Communications [1] including detailed build instructions, parts lists and open-source acquisition software for a new instrument: the smfBox. This would enable a broad range of scientists to perform confocal smFRET experiments on a validated, self-built, robust and economic instrument. The paper has already received more than 6,000 downloads and has generated a large amount of media attention, with coverage by 13 news outlets and an Altmetric score of 220. 

Applying open research in biophysics

Through our publication and the linked GitHub site, we provided everything needed to build and run the instrument, from hardware schematics to open-source software. We also provided open-source software for complete analysis of the data, in the form of a series of Jupyter notebooks. This allows other scientists to interact with and modify our datasets, which have a permanent DOI and include both data and analysis. This ‘open analysis’ approach allows complete transparency in the data we publish. Anyone can reproduce our analysis and figures, or alter analysis parameters to see the effects on the data, thereby establishing for themselves its robustness and any limitations.

As champions of open science and an elected member of the scientific advisory board to the international FRET community, we have encouraged the adoption of a standardised file format - HDF5 - for saving raw data. The HDF5 file architecture is FAIR-compliant, machine-readable and stores all required metadata alongside raw experimental data in a single file. The sfmBox automatically saves metadata and raw data in the HDF5 file format, allowing users to analyse their data with a range of compatible software solutions, including our own open analysis Jupyter notebooks. We are now working with other researchers to help spread the use of this file format.

Open science is at its best when others can easily take advantage of the progress made, and we have promoted this through video demonstrations of our instrumentation. In the Journal of Visual Experiment, we offer a step-by-step video protocol [2] for using the sfmBox to make accurate single-molecule FRET measurements. To increase access to this article, we have funded the full open access charge from our own consultancy funds (as UKRI pots are ineligible for this – a situation that ought to change).

Looking to the future

Open research is not about one thing, it is a way of life in the Craggs Lab, with the overall aim of making our science available, understandable and usable by the largest possible number of people. As a result of our open research, smfBox is currently being built by at least 5 other labs, in the US, Denmark and South Korea. We are also establishing a spinout company to produce a version of the instrument and software for sale and distribution to labs around the world. This company has received its first pre-seed investment funding, proving that open research can also lead to commercial opportunities.

Our ethos of open research includes many activities, from publishing all of our papers on relevant preprint servers [3] and the new approach of open-hardware instrumentation - in which we have been recognised as early leaders [4] - to establishing open data and analysis through encouraging standardised file formats for our field. Only through this multifaceted approach can open research realise its promise.

Our open research

  • Build instructions and parts list for smfBox made openly available 
  • Open-source software provided in form of Jupyter notebooks 
  • Step-by-step video protocol made available open access.

References

[1] Ambrose, B. et al. (2020). The smfBox is an open-source platform for single-molecule FRET. Nature Communications 11: 5641. https://doi.org/10.1038/s41467-020-19468-4

[2] Abdelhamid, M. et al. (2021). Making precise and accurate single-molecule FRET measurements using the open-source smfBox. JoVE 173: e62378. https://doi.org/10.3791/62378

[3] Craggs, T. et al. (2018). Substrate conformational dynamics drive structure-specific recognition of gapped DNA by DNA polymerase [Preprint]. bioRxiv. https://doi.org/10.1101/263038

[4] Fantner, G. and Oates, A. (2021). Instruments of change for academic tool development. Nature Physics 17: 421–424. https://doi.org/10.1038/s41567-021-01221-3

Paul Schneider and Robert Smith - Creating new data with parkrun

Paul Schneider and Robert Smith are PhD students in the Wellcome Trust DTC for Public Health, Economics & Decision Science. They found their first-year research attachment with ‘parkrun UK’ so interesting, they continued to work on it even after they moved on to their PhD research. They tell us more about their work with the organisation that has encouraged thousands of people across the UK to take up the weekly challenge of a 5km run or walk.

An interesting opportunity for a research attachment arose when, in December 2018, parkrun received funding from Sport England to set up 200 additional events. The aim of this was to further increase participation, particularly from deprived communities. We established a collaboration with parkrun UK and helped them to better understand the current disparities in access to and participation in parkrun events in England. This involved the development of a geospatial optimisation algorithm which provided recommendations for the best parks and green spaces in which to establish new parkrun events.

Integrating openness into public health research

Throughout the project, we tried to make our research as transparent and accessible as possible. As this was initially planned as a short-term project adjacent to our PhDs, we wanted to ensure that other researchers could use the wealth of data provided by parkrun UK. We also wanted to enable researchers in the 22 countries where parkrun is currently active to reproduce and refine our methods.

Our research resulted in multiple research outputs, including an interactive map that shows existing parkrun events and recommended locations for future events. Since recommended locations were not always suitable to host running events, the map proved useful for parkrun UK in allowing them to identify alternative locations in close proximity. Our work also informed parkrun UK’s broader strategy for making their running events more inclusive, as illustrated by a 2020 press release: 

Decisions about where to locate events have been informed by Rob’s expertise and insight, as have efforts to grow participation at those events once they have been established […] One example of how the statistical tool was used is the creation of Bowling Park parkrun, located in a deprived area of Bradford. Our local Ambassador, working with community groups, identified the location as an option for a parkrun event – which was corroborated by Rob’s work – and the event became a reality for the local people. [1]

Several open access publications resulted from the project [2,3,4], one of which is available in the Wellcome Open Research platform, which has an open peer-review process and staged version history. Preliminary results were also made available on preprint server medRxiv and promoted on social media to invite feedback. This led to an eagle-eyed reader spotting that their parkrun was missing from the map and informing us via the parkrun Facebook group. The bug was subsequently fixed, the interactive map updated and the paper corrected before submission to the journal. Rob also promoted our research to the wider public when he took part in Nicola Forwood and Danny Norman’s popular With Me Now parkrun podcast. 

Looking to the future

To ensure our research follows the FAIR principles and is reusable by others in the future, we have made our research as accessible as possible. All of our data have been made openly available in Zenodo [5,6] and GitHub, alongside an annotated version of the source code used to generate the results, meaning that others can replicate our findings. The source code was also submitted to two Repo-Hacks - day-long hackathons where researchers from different fields meet and try to reproduce the published research of others. Our study was successfully replicated and we received some useful feedback, enabling us to make further improvements, and we have since built on the work in a subsequent open access publication [3] which provides data over a 10 year period. Rob has also given talks alongside representatives from the Wellcome Trust as part of their effort to encourage other researchers to make their research more open.

Our open research

  • Preliminary results shared on preprint server, leading to a reader correction of data
  • Publication of research output on the Wellcome Open Research platform
  • Data and code openly available in GitHub and Zenodo.
     

References

[1] Using research to improve inclusivity. Parkrun UK[online]. 8 December 2020. https://blog.parkrun.com/uk/2020/12/08/using-research-to-improve-inclusivity

[2] Schneider, P. et al. (2020). Multiple deprivation and geographic distance to community physical activity events —achieving equitable access to parkrun in England. Public Health 189: 48-53. https://doi.org/10.1016/j.puhe.2020.09.002

[3] Smith, R. (2020). RobertASmith/DoPE_Public: Determinants of parkrun Engagement v1.0. [Dataset] Zenodo. https://doi.org/10.5281/zenodo.3596841

[4] Schneider, P. (2020). Code and Data Repository for: Multiple deprivation and geographic distanceto community sport events —achieving equitable access to parkrun in England. [Dataset] Zenodo. https://doi.org/10.5281/zenodo.3866143

Tom Webb - Climate change and marine ecology

Tom Webb of the Department of Animal and Plant Sciences has benefited from open research practices over the course of his career. In his project on the effect of increased environmental temperatures on marine species, Tom looks at how he is improving his own research practices.

As a macroecologist and biodiversity scientist, I am dependent on other people’s data in my quest to better understand how marine life is changing in the anthropocene. I have experienced the frustration of key datasets being unavailable or untidy, of broken links and code lacking documentation. I also understand the need to incentivise data providers, and follow best practice in acknowledging their efforts through citation and collaboration. As a data user, I also feel responsible for the ongoing process of improving my own research practices.

Applying open research in marine ecology

My progress towards open research is exemplified by my recent project on the thermal limits of marine species. It’s a study built on open data, developed using open-source software, with results published in an open access journal. It also encapsulates my belief that we must make the best use of existing data in our efforts to address the biodiversity and climate crises: collating, linking, remixing and enriching openly available data to ask and - eventually - answer novel questions.

I started with the aim of quantifying the thermal tolerances of marine species. Sometimes these limits have been determined experimentally, but the logistics, expense and ethics of this approach mean only a few hundred marine species will ever be assessed in this way. However, we do have access to over 78 million occurrence records for more than 150,000 species through the Ocean Biodiversity Information System (OBIS), as well as large open datasets of sea temperature. We wanted to find out if these existing datasets could be linked together to summarise the temperatures in which different marine species have been recorded, so we matched large samples of this data with openly available sea temperature datasets. Our estimates of thermal tolerance were shown to match experimental results well, demonstrating that it is possible to use open data to obtain accurate assessments of thermal tolerances, a vital indicator for predicting changes in distribution under climate change.

As well as using openly available data, thereby reducing the need to expend resources on collecting new data, we wanted to make our own research as widely accessible as possible. A good illustration of this is the Data Availability Statement in our article published in Ecology and Evolution (2020): 

'A major aim of this work is to make the tools required to replicate, adapt, and extend the methods presented freely available to the community. Our work uses existing publicly available data, and we show users how to access the same data from within the open source statistical environment R. Processed datasets and code for analysis and visualization are available via GitHub and are also deposited in Figshare [1] via the University of Sheffield's Online Research Data repository.’ [2]

This model of open data has become my preferred workflow, with data processed and analysed using open-source tools, archived in ORDA, and documented with an extensive readme serving as both documentation and tutorial. (There are more examples of our open data in GitHub and ORDA [3].)

Looking to the future

As part of our research, we created a data product [4] for the European Marine Observation and Data Network (EMODnet), showing how to derive, summarise and visualise thermal affinities for European marine species. Building on this, I am delighted to be leading the data products development team for Phase IV of EMODnet Biology (2021-23), a fantastic opportunity to work with scientists, data professionals, and research software engineers to make useful, accessible products to help shape international marine policy.

As someone of moderate technical ability, I have found the process of improving my research practice challenging, and it has taken me too long to adopt some aspects of good practice. But I am now embedding open research into the culture of my research group, and I am proud that the scientists I am training - including my early career co-authors [2] - are taking these principles with them wherever they go next.

Our open research

  • Existing open biodiversity & climate data used extensively; newly generated data & code published
  • Open data placed in repositories with documentation to maximise accessibility & usability
  • Open research incorporated into research group training to help spread best practices

References

[1] Webb, T. (2020). Data and code for Occupancy-derived thermal affinities reflect known physiological thermal limits of marine species. [Dataset] University of Sheffield, Figshare. https://doi.org/10.15131/shef.data.12249686.v1

[2] Webb, T. et al. (2020). Occupancy-derived thermal affinities reflect known physiological thermal limits of marine species. Ecology and Evolution 10(14): 7050– 7061. https://doi.org/10.1002/ece3.6407

[3] Webb, T. (2020). Linking dimensions of data on global marine animal diversity. [Software] University of Sheffield, Figshare. https://doi.org/10.15131/shef.data.12833891.v1

[4] Webb, T. and Lines, A. (2018). Thermal affinities for European marine species. [Dataset] Marine Data Archive. https://doi.org/10.14284/378

Robert Shaw - Creating new open-source software in computational chemistry

Postdoctoral researcher Robert Shaw and colleagues in the Department of Chemistry have successfully engaged with open research through the development of open-source software project libecpint.

Computational chemistry is increasingly used to guide and interpret experiments, as well as develop and test underlying theories of how chemistry happens. An important example is the modelling of systems containing heavier elements of the periodic table. These play vital roles in a range of areas, including improving sustainability of the chemical industry, producing new smart materials, and the nuclear fuel cycle. However, the ‘effective core potentials’ required for accurate and efficient calculations on these systems are typically only available in proprietary software. 

Applying open research to computational chemistry

When we looked at creating a piece of open-source software in this area, we knew that it needed to be reusable and reproducible across various software packages. We therefore set about developing an open-source library, using sustainable software development practices, that would provide effective core potential functionality to other programs. Improving the approaches used in these calculations would also greatly reduce computational expense. 

The novel algorithms we developed led to speed-ups of up to forty times over existing literature approaches, and we realised our implementation may also be beneficial in commercial computational chemistry packages. With this in mind, we released the code under the MIT open-source licence, allowing code reuse for open-source or proprietary projects. The next step in making the code accessible to the community was online hosting with version control on GitHub. In order to make the project sustainable, we wanted to ensure that others could contribute easily and meaningfully, so we added a number of helpful features. These included documentation for users and developers, a code of conduct and ‘architecture statement’ for contributors, and continuous integration to help find and correct errors before they caused problems for the project. Some of these valuable additions arose from engaging in the Journal of Open Source Software (JOSS) open peer review process.

The software library itself is an open research output and has been assigned a Zenodo DOI [1], making the software citable and helping to attract additional contributors. We have also produced several open access, peer-reviewed articles [2], including the article published in JOSS [3], which provides a statement of need for the software and its functionalities.

Another motivation for ensuring the openness of our research was alignment to the FAIR principles, and we aimed to make the algorithms required for calculations both accessible and reusable. We feel it is particularly important that a ‘reference implementation’ such as  libecpint is open, free and meets community standards for sustainable software, as one of its primary purposes is to help in the creation of other software implementations. Our open approach has therefore enabled users to adapt and further develop the work themselves.

Looking to the future

The impact of making this research and the resulting software open is that it has now been incorporated into at least four computational chemistry packages. These include the commercial/free for academic use package Entos and the open-source packages QCSerenity, VOTCA and Psi4. Notably, the inclusion in the popular Psi4 package led to my contributing to, and becoming a named author of, Psi4 and its corresponding journal article [4].

It was surprising, but positive, to find there was a larger demand for the software than had been anticipated. However, this has been something of a double-edged sword; while other interested researchers have contributed code that improves the software, there have also been requests for additional features or changes that have led to extra work. There have also been difficulties in navigating various sustainable software technologies, such as code inspectors and continuous integration, with limited expertise, time and resources. 

Funding or recognition for ensuring that scientific code is open and sustainable has been incredibly limited in the past, and it is very pleasing that the scientific community is making large strides to address the culture of irreducibility. On a personal note, the positive aspects of the time and effort invested are an increase in important skills and the knowledge that the software will be usable, and improvable, by the community for years to come. 

Our open research

  • Open-source software accessible to contributors for further development
  • Publication in open access journal with open peer review
  • Software subsequently incorporated into other open-source packages

References

[1] Shaw, R. and Hill, J. (2021). Libecpint. [Software]. Zenodo. https://doi.org/10.5281/zenodo.4694353

[2] Shaw, R. and Hill, J. (2017). Prescreening and efficiency in the evaluation of integrals over ab initio effective core potentials. The Journal of Chemical Physics 147(7): 074108. https://doi.org/10.1063/1.4986887

[3] Shaw, R. and Hill, J. (2021). libecpint: A C++ library for the efficient evaluation of integrals over effective core potentials. Journal of Open Source Software 6(60): 3039. https://doi.org/10.21105/joss.03039

[4] Smith, D. et al. (2020). PSI4 1.4: Open-source software for high-throughput quantum chemistry. The Journal of Chemical Physics 152: 184108.

Kirsty Liddiard, Dan Goodley and Katherine Runswick-Cole - Working with young disabled people

Living Life to the Fullest has given children and young people with ‘life-limiting’ or ‘life-threatening’ impairments opportunities to ‘speak about their lives in “new” ways: as joyful, creative, fun, challenging, but ultimately liveable, just like anyone else’. [1] Dr Kirsty Liddiard and colleagues from the School of Education explain how open research has been central to this ESRC-funded project.

One of the main aims of Living Life to the Fullest was to enable disabled children and young people to tell their own stories through engagement with the arts. To achieve this, we established a Co-Researcher Collective made up of young disabled people who co-led the inquiry with us. This meant that the project was grounded in academic and social openness, ensuring the research was accessible to everyone involved, from co-researchers, participants and their families to allies and community partner organisations.

Applying open research in disability studies

We undertook a number of public engagement and knowledge translation activities, including conferences and research festivals, both in person and online. Sharing our research outside of the project proved beneficial not only to young disabled people but also to organisations and employers. This was demonstrated by an event to which we invited Youth Employment UK (YEUK), a leading organisation working to change the youth employment landscape. Our co-researchers took the opportunity to share their insights on how employers can better support young disabled people, and as a result YEUK recognised the need to revise their resources for young people and also for employers to be more inclusive of young disabled people. We subsequently worked with YEUK to develop a webinar and written guide for employers, both of which were shared with over 700 organisations.

With the help of ESRC Festival of Social Sciences funding, we hosted several successful events in Sheffield, and we have had the opportunity to reach regional and national audiences through contributions to BBC television and radio. We were also delighted to commission a public art installation by Louise Atkinson and to support participants in submitting their artwork to the Rightfullives online art exhibition.

Other activities promoting open research included our work with Canine Partners, a registered charity that transforms the lives of disabled people through assistance dogs. Following further exploration of our early findings, we contributed to the Canine Care Project, which featured a report and a professionally animated film that are openly available and fully accessible to disabled young people and their families.

Looking to the future

In addition to sharing the results of our research, our open research culture motivated us to share our unique ways of working with others in our Co-production Toolkit. Why Can’t We Dream? is an online collection of freely available, downloadable resources co-designed and built with our research partners. The toolkit shows how a diverse team of academics, organisations and young disabled people can successfully carry out a co-produced, arts-informed research project. The toolkit provides a range of resources for researchers, charities, non-profit organisations and schools wishing to work in co-production with marginalised young people. Offering podcasts, films and lesson plans, the toolkit has already been adopted by a number of organisations around the UK, and we hope that our culture of open research will lead to more exciting projects and opportunities for young people in the future. 

Our open research

  • Research processes and findings shared via videos, blog and social media
  • Openly available online toolkit created for organisations working with young disabled people
  • Open and transparent collaboration with young disabled people and partner organisations

References

[1] Living Life to the Fullest. [Website] https://livinglifetothefullest.org/about/

[2] Liddiard, K. et al. (2019). Working the edges of Posthuman disability studies: theorising with disabled young people with life-limiting impairments. Sociology of Health and Illness 41(8): 1473-1487. https://doi.org/10.1111/1467-9566.12962

[3] Liddiard, K. et al. (2018). “I was excited by the idea of a project that focuses on those unasked questions”: Co-Producing disability research with disabled young people. Children and Society 33(2): 154-167. https://doi.org/10.1111/chso.12308

FAIR Case Studies: Good Practice on FAIR Data and Software

FAIR stands for Findable, Accessible, Interoperable, and Reusable and comprises a set of principles designed to increase the visibility and usefulness of your research to others. You can find more guidance on the FAIR principles below.

Find out more about FAIR Case Studies
 

Flagship institutes

The University’s four flagship institutes bring together our key strengths to tackle global issues, turning interdisciplinary and translational research into real-world solutions.