The inconsistent ways in which vast amounts of scientific data are stored make their discovery, linking and reuse difficult for researchers. The “Enabling FAIR Data” project will develop a set of data management best practices for publishing in the Earth and space science community. Credit: iStock / liuzishan

Scientific results, particularly in the Earth and space sciences, are increasingly dependent on large and complex data sets, as well as models that transform these data. In many cases, the data are difficult to acquire, such as one-time observations of the Earth or other planets that cannot be repeated, for example, 40-year-old data from the Voyager mission.

Increasingly these data sets are stored or made available separately from the actual publication because of their sheer size. But even when data are saved, the ways in which they are stored and catalogued is uneven, making discovery and linking of data sets that should be allied difficult or impossible. For example, data are often stored with publishers as PDF files or other supplements without any metadata or in general repositories without any quality control or curation.

Other researchers are often not able to understand the data sets without contacting the original author, if that is even possible. For examples of such difficulties see a recent Editor’s Vox by the Editor-in-Chief of Journal of Geophysical Research: Oceans.

There is, at best, nascent interoperability across different data repositories and between repositories and scholarly publishers. Domain repositories, which act as a home for data from particular disciplines, have emerged to handle different data sets, but standards and procedures vary between them.

Over the past few years, several organizations, including AGU, have led efforts to address key parts of these problems at a high level, such as the Center for Open Science’s Transparency and Openness Promotion (TOP) Guidelines, the FAIR (Findable, Accessible, Interoperable, and Re-usable) Data Principles, and the Data Citation Principles.

In particular, AGU’s Brooks Hanson and Kerstin Lehnert of Columbia University, helped form the Coalition on Publishing Data in the Earth and Space Sciences (COPDESS), which includes a statement of commitment signed by major publishers and repositories about requiring data to be included with publications. As a result, most Earth and space science publishers now require authors to make data in support of a publication available, preferably using domain specific repositories that provide high quality data curation. However, the practice of these and other guidelines is haphazard in part because workflow solutions to connect researchers, publishers, and repositories are neither standard nor widely adopted.

To address this critical need, the Laura and John Arnold Foundation recently awarded a grant to a coalition of groups representing the international Earth and space science community. This project, which is being convened by AGU, will develop standards that will connect researchers, publishers and data repositories in the Earth and space sciences to enable FAIR (findable, accessible, interoperable, and reusable) data – a concept first developed by Force11 – on a large scale.

The project is called “Enabling FAIR Data.” The partnership currently includes AGU, Earth Science Information Partners (ESIP) and Research Data Alliance (RDA), and has support from the Proceedings of the National Academy of Sciences, NatureScience, National Computational Infrastructure, AuScope, the Australian National Data Service, and the Center for Open Science.

This effort will build on the work of COPDESS, ESIP, RDA, the scientific journals and domain repositories to ensure that well documented data, preserved in a repository with community-agreed metadata, and supporting persistent identifiers becomes part of the expected research products submitted in support of each publication. It is expected that the broader community will play a key role in the recommended guidelines and approach. A key goal is to make a process that is efficient and standard for researchers and thus supports their work from grant application through to publishing.

A set of best practices will be developed including metadata and identifier standards; data services; common taxonomies; landing pages at repositories to expose the metadata and standard repository information; standard data citation; and standard integration into editorial peer review workflows to facilitate adoption by publishers and a consistent experience for researchers. Visit the COPDESS website to keep up-to-date with the project.

Open, accessible, and high-quality data, and related products such as software, are critical to the integrity of published research. They ensure transparency, support reproducibility and are necessary for the advancement of science. In Earth and space science, critical data can also have diverse and important societal benefits and be used for critical real-time decision-making.

AGU’s Data Position Statement affirms that “Earth and space sciences data are a world heritage. Properly documented, credited, and preserved, they will help future scientists understand the Earth, planetary, and heliophysics systems.” By convening the Earth and space science community in this exciting new project, AGU continues to lead the way in developing best practices in scientific research and publishing, and making it accessible for the benefit of humanity.

—Shelley Stall, Director, Data Programs, American Geophysical Union; email: sstall@agu.org  orcid.org/0000-0003-2926-8353

Citation:

Stall, S. (2017), Enabling findable, accessible, interoperable, and reusable data, Eos, 98, https://doi.org/10.1029/2018EO081907. Published on 15 September 2017.

Text © 2017. The authors. CC BY-NC-ND 3.0
Except where otherwise noted, images are subject to copyright. Any reuse without express permission from the copyright owner is prohibited.