Mathematical Geophysics Opinion

Embracing Open Data in Field-Driven Sciences

Allowing data to be reused and research results to be replicated fosters innovation, high-quality research, and public trust in science.

By

Climate change and the other complex issues facing our planet require the application of vast amounts of data, information, and knowledge to be fully understood. No single scientist or organization has all of the data, tools, or capabilities to do this work; only by bringing diverse research communities together around shared data and information will these problems be addressed.

However, ecology, geology, and the other sciences that depend on field observations pose special challenges for data and sample sharing. These disciplines often rely on time-sensitive, perishable data and samples that are temporally and spatially unique: the observations of an ash plume evolving or the samples amassed during a research cruise to assess the impacts of an oil spill.

These challenges have caused field-driven sciences to be slow in making data accessible and reproducible. To keep research on the leading edge of discovery, data and sample sharing in the field sciences must evolve to become common practice.

The Value of Open Field Data

A recent policy forum article published in Science argues that open access to data and samples is fundamental for verifiable progress in the field sciences. According to the piece, making data and samples freely available fosters transparency, enhances the value of data by opening it up to further analysis, and fosters integrity in research results and public trust in science.

A world with publicly available data and samples is one where scientists can achieve greater impact through collaboration while expanding global research. Take, for example, the Past Global Changes (PAGES) project. The data generated by PAGES-coordinated research projects are available in publicly accessible databases, including those of the National Centers of Environmental Information (formerly the National Geophysical Data Center), which is operated by the National Oceanic and Atmospheric Administration (NOAA).

Through PAGES, members of the international paleoclimate community assembled existing climate records, such as tree rings, fossil pollen, and other proxy data, of the past 2000 years. These data will also be open access, allowing others to use them to improve projections of future climate and help society prepare for climate-related hazards. Efforts like PAGES further scientific inquiry by enabling disparate research endeavors to examine deep pools of data already in existence.

Sharing data and samples also contributes to the health of humans and society. This was readily demonstrated years ago when NOAA released weather data to the public. Other groups used these data to generate more accurate forecasts, which lowered the costs of weather-related damage. In addition, a billion-dollar weather industry that relies on NOAA’s real-time data emerged; both Weather.com and Weather Underground are built on open government raw weather and climate data.

Increasing access to field sciences data and samples, such as those collected during a 2012 research cruise to explore the biological, chemical, and physical properties of a Southern Ocean phytoplankton bloom, will move science forward by facilitating reproducibility of results and reuse of data. Credit: Rebecca Fowler
Increasing access to field sciences data and samples, such as those collected during a 2012 research cruise to explore the biological, chemical, and physical properties of a Southern Ocean phytoplankton bloom, will move science forward by facilitating reproducibility of results and reuse of data. Credit: Rebecca Fowler

To further critical research efforts and expand the use of its reams of climate, weather, and environmental data resources, NOAA launched the Big Data Project in 2015. The initiative aims to more effectively and efficiently distribute these data to scientists, decision makers, and industries and enable the development of new products and services that will further understanding of our planet. In short, releasing data benefits everyone.

Open data can also be an important tool for education and outreach. The Science article mentions the National Science Foundation’s Ocean Observatories Initiative, whose unifying cyberinfrastructure acquires, processes, and distributes data from more than 700 oceanographic instruments, not only to researchers but also to educators and the public. Anyone can plot or download the data. Such publicly available resources can encourage citizen science and public involvement in science.

Overcoming Data Hoarding

As the authors of the Science article acknowledge, sharing field sciences data and samples can be time-consuming and resource intensive. The cultural, financial, and technical barriers to making data open and research reproducible are, indeed, numerous.

Scientists may guard their data, fearing that sharing their data might allow another research group to scoop them and publish results first. Productivity in science is traditionally measured by the number of papers a researcher publishes in traditional journals with high impact factors. Massive amounts of time, money, and energy are behind scientific data—these published papers represent a return on this investment.

The Science article maintains that bottom-up approaches are more likely to effect change than top-down mandates. Research institutions, funding agencies, and other stakeholders need to develop new ways to change research incentives to recognize the value of data sharing to science. In the current culture, those who create data sets and software or share their data often receive little acknowledgment for doing so. Data journals with citable output are one means of recognizing the contributions of data wranglers and the value of open data to science.

Other approaches could involve giving researchers who put data online an edge, with unique funding opportunities or institutional awards. Ultimately, new approaches will stand a better chance of success if they take into account the traditional culture of scientific research, where data are often seen as proprietary and long-term preservation of data is a low priority.

Digitizing and Archiving Data

Digitizing and integrating data are another challenge for the field sciences community. Data management and storage were not always factored into research projects—not to mention that much of the field sciences involves observations that are not repeatable—thus valuable historical records are often missing from repositories.

Despite the resources required to formally archive and make the vast amount of data in existence accessible, it is still more cost-effective to preserve these data than it is to try to fill gaps in records through new studies. To make this process easier, data repositories could help researchers properly cite, share, and secure their data by providing training and technical support.

The Challenge

Making data accessible and reproducible calls for a massive cultural shift in science, one that requires changes in funding strategies, community values, and how science is done. For this shift to happen, everyone must be invested in it—funding agencies, researchers, data repositories, and journals.

We need new initiatives and approaches to facilitate data deposition and recognize the creators of data sets and software. Critically, we need these new initiatives funded. The barriers to opening field sciences data are large, but those involved with the field-based sciences are adept at solving difficult problems through creativity, collaboration, and development of new technologies.

Embracing open science improves discovery, access, integration, use, and value of field sciences data and samples. There may always be data hoarders and those who value opacity over transparency, but the acceptance of open data comes at a price only to those who do not want to confront the changing nature of science.

—Rebecca Fowler, Federation of Earth Science Information Partners, Boulder, Colo.; email: [email protected]

Citation: Fowler, R. (2016), Embracing open data in field-driven sciences, Eos, 97, doi:10.1029/2016EO047789. Published on 10 March 2016.

© 2016. The authors. CC BY-NC 3.0