This is the first post in a series on open research in science, with a focus on neuroscience. The aim of this series is to explore what open research is, why it is important, what tools are available, and how it can benefit researchers, as well as society, more generally. If you have any suggestions or would like me to cover specific topics, feel free to leave a comment.
What is it?
Open research revolves around the following three pillars, which all relate to making information freely available to the public:
Open access: this means free and unrestricted online access to scientific publications1. The argument is that publicly funded research should be freely accessible to the public. For example, open access allows the general public to obtain evidence-based information, helps journalists to accurately report scientific discoveries, and gives access to institutions which may not be able to afford subscriptions to scientific journals.
Open data: this means making datasets freely available to the public. It allows other researchers to reproduce your data, build on it, or analyse it in a novel way.
Open source: this means making the source code underlying data acquisition, data analysis/processing routines, simulations, models, and statistical analyses (etc.) freely available to the public. The code can be shared and built on for further uses2,3.
Standardised platforms and open data formats are tools which are necessary to make open data and data sharing possible.
Open research not only benefits individual researchers, but society as a whole. It opens up research to the general public, promotes collaboration, increases transparency and reproducibility, increases the visibility of researchers' work, fosters good scientific practice, and allows existing data to be re-analysed and re-purposed.
Transparency, reproducibility, and data re-use
Surveys and replication efforts in different scientific fields demonstrate that reproducibility issues are an increasing concern in research. A survey of more than 1500 researchers carried out by Nature revealed that many researchers from different disciplines have failed to replicate others' experiments. Similarly, a project commissioned by the European Commission only managed to replicate approximately 40% of the original findings of 100 selected psychology studies. Poor reproducibility can stem from, among other things, complicated and poorly annotated protocols, statistical errors, selective reporting, a small effect size, or a flawed study design. Greater transparency of the protocols and statistical analyses, as well as making datasets open, could go a long way towards avoiding such reproducibility issues. Besides increasing transparency, making datasets openly available avoids unnecessary duplication efforts and allows the data to be re-purposed. Other researchers may be interested in analysing your data in a different way to address their own research questions, or to create a meta-analysis. Either way, provided proper credit is given, data re-use seems like an efficient way of encouraging new discoveries.
Reward structures in academia - what qualifies as research output?
You may be familiar with the admonition "Publish or Perish". It describes the pressure academics face to regularly publish in high-impact journals in order to increase their chances of having a successful scientific career. The pressure to publish groundbreaking discoveries has been linked to reproducibility issues in science4. Meanwhile, confirmatory or negative findings are not rewarded (one solution to this is Registered Reports, which I will cover in a subsequent blog post). The current reward structure in academia means that researchers may be reluctant to make their data open, particularly if they plan to publish several papers based on a single dataset. If, however, the publication of datasets was considered as research output and rewarded accordingly, it may encourage researchers to make their data open sooner. It would reward research productivity and limit publication biases (when the outcome of the research determines whether and where it is published). Some journals welcome the publication of datasets accompanied by descriptors. For example, you can submit datasets as part of a Tools and Resources article at eLife or to the open access journal Scientific Data.
Structuring your data for open research requires effort
Some technically adept researchers may format their data for sharing almost by default. Others may see the benefits of making data open, yet see it as another thing they don't have time for. Attempts to set up a data infrastructure that makes sense always seems to be hampered by the need to urgently finish some analysis or a set of experiments. The thing is, structuring your data in such a way that it is easily shareable is not only good for open data, it's also really helpful for you. If you always keep in mind that someone else will have to understand a particular experiment/analysis one day, you're more likely to structure it clearly and make useful annotations, which you'll be really happy to find when you look at that data again a couple of months later. A good data infrastructure also helps to keep track of any changes in your data analysis. It will end up saving you a bunch of time while also ensuring other researchers can easily make sense of your data and possibly even re-use it one day. And while version control and analysing large datasets efficiently will inevitably involve familiarising yourself with a terminal and some programming, those skills will be immensely useful for whatever you choose to do after your PhD.
Funding providers increasingly encourage open research
Many funders of research have implemented policies, which aim to maximise the impact of research output by encouraging or requiring open access and the sharing of resources and tools. The Deutsche Forschungsgemeinschaft (DFG) supports open access but does not make it obligatory (see their Open Access FAQ). Some funding organisations may also request an output management plan as part of their grant schemes to ensure any valuable research output (databases, software tools, etc.) are shared for other researchers to build on5,6. At the EU level, after a successful Open Research Data Pilot, Horizon 2020 has now implemented a default open access policy, with opt-out options for relevant situations, such as protecting privacy or intellectual property7. The Sherpa Juliet database is an excellent tool to find out more about funders' policies on open access and data archiving.
What are some things I can do as a researcher (in neuroscience)?
I'll elaborate on this in subsequent blog posts, but for now, here are a few resources that you may find useful:
- The International Neuroinformatics Coordinating Facility (INCF) is advancing data sharing, integration and re-use in neuroscience through resources and training, which you may find interesting
- Have a look at the Neuroscience Information Framework, a repository of resources and tools in neuroscience. They also offer a set of Webinars, which you can catch up on their YouTube channel
- Find out what repositories are out there, for example by searching the Registry of Research Data Repositories
- The G-Node (in the LMU Biocenter) offers a data infrastructure service called GIN as well as other tools (I will write a blog post focusing on G-Node tools in the future)
- A Horizon 2020 Open Research Platform is in the pipeline. It will offer researchers funded by Horizon 2020 the opportunity to publish their peer-reviewed research and pre-prints for free8.
- Finally, you may find these resources useful:
Stay tuned for my next posts in this series, which will cover researchers' attitudes towards open data, some example initiatives based on open research practices, and data sharing tools in neuroscience.
1.Open Access to scientific information
2.Standard practices for code
3.Extending transparency to code
5.Funders' Policies Research Data Management
6.Outputs Management Plan, Wellcome Trust
7.Open Research Data
8.Horizon 2020 Open Research Publishing Platform