Openness in science

Open ScienceScience

March 13, 2018 Comments 1 comment

This is the first post in a series on open research in science, with a focus on neuroscience. The aim of this series is to explore what open research is, why it is important, what tools are available, and how it can benefit researchers, as well as society, more generally. If you have any suggestions or would like me to cover specific topics, feel free to leave a comment.

Open Research

What is it?

Open research revolves around the following three pillars, which all relate to making information freely available to the public:

Open access: this means free and unrestricted online access to scientific publications¹. The argument is that publicly funded research should be freely accessible to the public. For example, open access allows the general public to obtain evidence-based information, helps journalists to accurately report scientific discoveries, and gives access to institutions which may not be able to afford subscriptions to scientific journals.

open_research

Open data: this means making datasets freely available to the public. It allows other researchers to reproduce your data, build on it, or analyse it in a novel way.

Open source: this means making the source code underlying data acquisition, data analysis/processing routines, simulations, models, and statistical analyses (etc.) freely available to the public. The code can be shared and built on for further uses²^,³.

Standardised platforms and open data formats are tools which are necessary to make open data and data sharing possible.

Why?

Open research not only benefits individual researchers, but society as a whole. It opens up research to the general public, promotes collaboration, increases transparency and reproducibility, increases the visibility of researchers' work, fosters good scientific practice, and allows existing data to be re-analysed and re-purposed.

Transparency, reproducibility, and data re-use

Surveys and replication efforts in different scientific fields demonstrate that reproducibility issues are an increasing concern in research. A survey of more than 1500 researchers carried out by Nature revealed that many researchers from different disciplines have failed to replicate others' experiments. Similarly, a project commissioned by the European Commission only managed to replicate approximately 40% of the original findings of 100 selected psychology studies. Poor reproducibility can stem from, among other things, complicated and poorly annotated protocols, statistical errors, selective reporting, a small effect size, or a flawed study design. Greater transparency of the protocols and statistical analyses, as well as making datasets open, could go a long way towards avoiding such reproducibility issues. Besides increasing transparency, making datasets openly available avoids unnecessary duplication efforts and allows the data to be re-purposed. Other researchers may be interested in analysing your data in a different way to address their own research questions, or to create a meta-analysis. Either way, provided proper credit is given, data re-use seems like an efficient way of encouraging new discoveries.

Challenges

Reward structures in academia - what qualifies as research output?

You may be familiar with the admonition "Publish or Perish". It describes the pressure academics face to regularly publish in high-impact journals in order to increase their chances of having a successful scientific career. The pressure to publish groundbreaking discoveries has been linked to reproducibility issues in science⁴. Meanwhile, confirmatory or negative findings are not rewarded (one solution to this is Registered Reports, which I will cover in a subsequent blog post). The current reward structure in academia means that researchers may be reluctant to make their data open, particularly if they plan to publish several papers based on a single dataset. If, however, the publication of datasets was considered as research output and rewarded accordingly, it may encourage researchers to make their data open sooner. It would reward research productivity and limit publication biases (when the outcome of the research determines whether and where it is published). Some journals welcome the publication of datasets accompanied by descriptors. For example, you can submit datasets as part of a Tools and Resources article at eLife or to the open access journal Scientific Data.

Structuring your data for open research requires effort

Some technically adept researchers may format their data for sharing almost by default. Others may see the benefits of making data open, yet see it as another thing they don't have time for. Attempts to set up a data infrastructure that makes sense always seems to be hampered by the need to urgently finish some analysis or a set of experiments. The thing is, structuring your data in such a way that it is easily shareable is not only good for open data, it's also really helpful for you. If you always keep in mind that someone else will have to understand a particular experiment/analysis one day, you're more likely to structure it clearly and make useful annotations, which you'll be really happy to find when you look at that data again a couple of months later. A good data infrastructure also helps to keep track of any changes in your data analysis. It will end up saving you a bunch of time while also ensuring other researchers can easily make sense of your data and possibly even re-use it one day. And while version control and analysing large datasets efficiently will inevitably involve familiarising yourself with a terminal and some programming, those skills will be immensely useful for whatever you choose to do after your PhD.

Funding providers increasingly encourage open research

Many funders of research have implemented policies, which aim to maximise the impact of research output by encouraging or requiring open access and the sharing of resources and tools. The Deutsche Forschungsgemeinschaft (DFG) supports open access but does not make it obligatory (see their Open Access FAQ). Some funding organisations may also request an output management plan as part of their grant schemes to ensure any valuable research output (databases, software tools, etc.) are shared for other researchers to build on⁵^,⁶. At the EU level, after a successful Open Research Data Pilot, Horizon 2020 has now implemented a default open access policy, with opt-out options for relevant situations, such as protecting privacy or intellectual property⁷. The Sherpa Juliet database is an excellent tool to find out more about funders' policies on open access and data archiving.

What are some things I can do as a researcher (in neuroscience)?

I'll elaborate on this in subsequent blog posts, but for now, here are a few resources that you may find useful:

The International Neuroinformatics Coordinating Facility (INCF) is advancing data sharing, integration and re-use in neuroscience through resources and training, which you may find interesting
Have a look at the Neuroscience Information Framework, a repository of resources and tools in neuroscience. They also offer a set of Webinars, which you can catch up on their YouTube channel
Find out what repositories are out there, for example by searching the Registry of Research Data Repositories
The G-Node (in the LMU Biocenter) offers a data infrastructure service called GIN as well as other tools (I will write a blog post focusing on G-Node tools in the future)
A Horizon 2020 Open Research Platform is in the pipeline. It will offer researchers funded by Horizon 2020 the opportunity to publish their peer-reviewed research and pre-prints for free⁸.
Finally, you may find these resources useful:
- table of resources derived from the Open Science Prize website
- interesting open source tools at OpenBehavior
- PyPI for Python libraries

Stay tuned for my next posts in this series, which will cover researchers' attitudes towards open data, some example initiatives based on open research practices, and data sharing tools in neuroscience.

References

1.Open Access to scientific information
2.Standard practices for code
3.Extending transparency to code
4.Reproducibility Crisis
5.Funders' Policies Research Data Management
6.Outputs Management Plan, Wellcome Trust
7.Open Research Data
8.Horizon 2020 Open Research Publishing Platform

Delwen Franzen