Open research and data sharing: examples of success
In my last post, I wrote about some of the basics of open research: how it can address reproducibility issues in science, and some of the challenges researchers face. This post highlights some initiatives that promote or make full use of open research practices.
The Allen Institute for Brain Science & Janelia Research Campus
These institutes generate datasets, tools and resources and make them available to the scientific community for the acceleration of new discoveries. The Allen Institute for Brain Science based in Seattle is generating large datasets on the mouse and human brain, which are shared with other researchers according to an open science model. Databases include gene expression profiles, connectivity maps, electrophysiological characteristics and single cell morphology. Besides these datasets, the Allen Institute generates many other useful resources, including more than 100 transgenic mouse lines, open-source software packages, hardware components, as well as human and mouse reference atlases for research use. Tools and resources can be accessed on the Allen Brain Atlas Data Portal. For example, the Allen Cell Types Database provides electrophysiological, morphological and transcriptomic data for single cells in selected brain areas of the mouse and human. Their latest tool, the Allen Brain Observatory, launched in June 2016, shows visually evoked calcium responses of GCaMP6-expressing neurons in different layers of the mouse visual cortex in response to stimuli such as gratings, natural scenes and locally sparse noise. They also provide the necessary source code to read and process the data in the The Allen Software Development Kit.
The Janelia Research Campus is also advancing open science in neuroscience by sharing data, resources and tools with the scientific community via open-source platforms. Tools and resources include GAL4 fly lines, mouse lines, software, designs for set-up components, reagents and more.
International collaborations have been very successful in fields such as physics, astronomy and genomics, where researchers have needed to share large-scale instrumentation. For example, the LIGO (Laser Interferometer Gravitational-Wave Observatory) Scientific Collaboration, which recently detected gravitational waves, included more than 1,000 international researchers. In neuroscience, while shared facilities (e.g. the Advanced Imaging Center at Janelia) and international collaborations do exist, there is still much discussion on the ideal format1. Hurdles include establishing funding mechanisms that cross national borders, creating a standardised infrastructure to share data, tools, software and resources, and forming a working governance structure that is effective as well as supportive of researchers' careers. Other difficulties include agreeing on how specific the research question should be, and how best to study the activity of neuronal ensembles in the context of specific behaviours.
The International Brain Laboratory (IBL) is an international neuroscience collaboration, officially launched in September 2017, that is trying to address some of these concerns. The initiative is funded by the Simons Foundation, the Wellcome Trust and the INCF. In short, 21 experimental and computational neuroscience labs will work together to study the neural basis of a specific decision-making task. Decision making is a fundamental adaptive behaviour that involves integrating sensory cues, forming certain predictions based on previous experience, and choosing to execute a specific action.
Importantly, the IBL aims to advance standardisation efforts and to set a precedent for new ways of collaborating in neuroscience. While a certain degree of sharing and collaboration does of course already occur in neuroscience research, most research questions are addressed by individual labs. Naturally, these have limited resources and often use custom-made hardware/software in their experiments. This can make it difficult to compare or reproduce research findings. To address these problems, the IBL has developed a specific decision-making paradigm, through which they will probe neural activity in different brain regions using three distinct methodologies (Neuropixels recordings, 2-photon imaging, and fibre photometry). Theoretical neuroscientists will use these datasets to develop new theories and hypotheses. The aim is to record the coordinated activity across brain regions underlying decision making.
Effective data sharing, storing and analysis
For their collaboration to work effectively, the IBL has developed a data infrastructure, which also aims to advance data sharing practices within the neuroscience community as a whole. IBL researchers will be able to upload their data to a common database immediately after their experiment, allowing others within IBL to build on those results. The raw data will be pre-processed and stored alongside the metadata in a standardised data format adopted by all laboratories. Providing they can successfully replicate the decision-making paradigm across all participating institutions, their data infrastructure will allow for all the data to be pooled and analysed in the same way, again using standardised analysis pipelines. They will also include a replication step: recordings in a particular brain region will always be duplicated in another laboratory for comparison, to ensure reproducibility. IBL researchers have also committed to making their experimental hardware, software and data architecture open source2.
Architecture for data sharing used by the International Brain Laboratory. Reproduced with permission from The International Brain Laboratory, 2017. doi: https://doi.org/10.1016/j.neuron.2017.12.013, made open access under the CC BY 4.0 license.
The success of the IBL will largely depend on whether the standardised behavioural paradigm can be replicated in all 21 participating laboratories. The same applies to the recording procedures. During my PhD, I measured electrical signals from single cells using the patch-clamp method. I was acutely aware of all the experimental variables that could affect the outcome of an experiment: anything from the composition of the internal solution, the temperature, the slicing quality, the freshness of the recording solution, the age of the animal or pipette size could affect the recording (not to mention other sources of variability such as human error). As such, achieving reproducibility within a consortium of 21 laboratories will undoubtedly represent a huge challenge in itself, but if successful, the IBL will open up a new way of doing neuroscience and considerably advance efforts to efficiently share data, analysis routines, protocols and other tools. This will hopefully lead to more collaborations and engage a wider and more diverse range of actors in neuroscience research to drive discovery and innovation.
Structural Genomics Consortium
The Structural Genomics Consortium (SGC) is an example of how the adoption of an open science approach has catalysed drug discovery research. The SGC was established in response to a deceleration in drug discovery resulting from intellectual property policies and the increasing complexity of the underlying science. Funding is provided by multiple investors, thereby reducing the risk for any individual organisation. The SGC solves protein structures of medical importance and waives patent claims so that all research outputs can be made available and shared. A case study commissioned by the European Commission showed that the open science approach attracted a larger and more diverse group of researchers from academia, pharmaceutical companies and other stakeholders who would otherwise be excluded for proprietary reasons, thereby expanding expertise and the potential to build on new discoveries. This approach also significantly reduced the chance of duplication of research efforts. In addition to successfully advancing the pace of discovery research, the model adopted by the SGC also dramatically reduces the costs usually associated with this type of research3.
The Open Science Prize
The Open Science Prize is an initiative from the US National Institutes of Health, the Howard Hughes Medical Institute and the Wellcome Trust, which encourages researchers to develop tools that allow open data to be accessed or re-used to generate innovation and benefit society. The latest prototype to win the Open Science Prize is Nextstrain: a pipeline to process genetic data from labs all over the world to reconstruct viral phylogenetic trees, which can be used to make epidemiological inferences (e.g. spatial spread of viral populations such as Ebola).
Digital Social Innovation
Digital Social Innovation (DSI) is a rapidly growing field in which innovators, researchers and communities collaborate using digital technologies to tackle global societal challenges4. Digital technologies (digital fabrication, geotagging, machine learning, artificial intelligence, blockchain and more) are used to address issues in fields such as healthcare, democracy, education, migration and the environment. Initiatives use open source technology, open data, crowdsourcing and social media (and more) to allow communities to work collaboratively and citizens to be more engaged in delivering social impact. The DSI4EU project, supported by the European Commission, aims to explore how emerging digital technologies and innovations can be leveraged to achieve greater social good. It has led to the creation of the Digital Social Innovation platform, an effort to expand the rapidly growing DSI community in Europe. The platform allows users to explore and contribute to a growing network of organisations and collaborative projects that are using DSI to tackle social challenges. A number of exciting initiatives have already been launched, including:
- Smart Citizen: the 'Smart Citizen Kit' (an Arduino) can be used to record environmental data such as temperature, concentration of certain chemicals, humidity or sound intensity, which are then uploaded to a server and analysed. Users can make additions or improvements to the kit so that new variables can be recorded. Smart Citizen also connects people with researchers to make use of the collected data to address environmental problems.
- OpenCorporates: this database makes information on companies and the corporate world open to the general public. The data provides the means to understand and monitor the increasingly complex nature of companies, and to counter the use of companies for anti-social purposes, such as corruption.
- Too Wheels: this provides an open-source blueprint to build a sports wheelchair. The blueprint can be amended to fit particular dimensions, and the prototype can be built from inexpensive materials such as plywood and pipes.
These projects are yet another example of how open science and data sharing practices encourage collaborations between people with diverse skillsets to foster new discoveries that benefit society.
My next posts will cover standardised data formats and researchers' attitudes towards open research.
1. Toward a Global BRAIN Initiative
2. International Brain Laboratory
3. Structural Genomics Consortium
4. Digital Social Innovation
One thought on “Open research and data sharing: examples of success”