Intense wildfires are on the rise in many parts of the world. Streets in many coastal communities flood at high tide. And droughts threaten to drive some farmers out of business. Increasingly, ordinary citizens are viewing climate change less as a hypothetical concept that has little effect on them personally and more as a disturbing new reality. To meet growing demands for information about the risks that communities face, the quality and validity of data on which updated assessments of the state of the climate are based must continue to improve, and new tools are needed to access those data.
Periodically since the enactment of the Global Change Research Act of 1990, the U.S. federal government has issued the National Climate Assessment, a review of climate science and current and projected climate impacts. Not surprisingly, the November 2018 release of the second volume of the Fourth National Climate Assessment (NCA4), which focuses on impacts and risks, made the front pages of newspapers across the country. Since then, the scientific and popular press has cited this report’s findings innumerable times. Each National Climate Assessment is the product of an extensive ongoing effort, involving numerous experts and agencies, to collect, curate, validate, and document the sources and descriptors for a mountain of data.
Building a Solid Foundation
NCA4 is produced by the U.S. Global Change Research Program (USGCRP), with contributions from more than a dozen federal agencies and hundreds of scientists inside and outside the government. It is a vital reference for scientists, educators, decision-makers, and everyone interested in climate change as it affects the United States.
Because the NCA reports are so important and influential, a great deal of effort goes into enhancing their quality and usefulness. NCA4 was thoroughly reviewed by the National Academies of Sciences, Engineering, and Medicine; by the public; and by the 13 member agencies of the USGCRP. The findings are backed up by extensive documentation, called “traceable accounts,” of the process and the evidence base used to arrive at the findings, including the remaining uncertainties in the scientific record. All the information in the assessment, as well as other USGCRP information products, is documented in a database called the Global Change Information System (GCIS).
By fully documenting and making accessible the sources of all scientific information in USGCRP assessments, GCIS enhances the value of these reports in three ways. First, it increases the credibility of the report. Second, it enhances reproducibility because the data and processes used to arrive at the conclusions are transparent to users. Third, it makes the report more flexible as a decision support tool because the information is more discoverable within the report and because the data and processes used can be tailored to specific users’ needs.
GCIS, a web-based open-source portal that manages metadata for USGCRP reports, develops and documents the provenance and relationships of the data produced by these reports in machine-readable formats. GCIS not only serves as a metadata repository but also tracks individual entities in the system, called “resources,” which can include chapters or figures in other reports, web pages, scientific databases, and other sources of information. GCIS also provides connections between individual resources.
Collecting Data About the Data
In the data world, it is easy to mismanage a wide assortment of sources in the transition from data to information and from information to knowledge. This is where metadata come in. Constructing a good set of metadata starts with the process of collecting what, why, where, how, and when data were created; curating that metadata according to well-established standards; and providing quality assurance before the report is released to the public. Generally, a good metadata set should answer questions about the source data:
- Are the data reusable?
- Can I generate the output with a different set of variables?
- Is the source trustworthy?
- Are the data interoperable? That is, do the systems and services that create, exchange, and use the data have clear, shared expectations for their contents, context, and meaning?
GCIS follows international as well as U.S. federal open data standards and principles wherever possible. The North Carolina Institute for Climate Studies (NCICS) NOAA-sponsored Technical Support Unit (TSU) ensures that NCA information included in GCIS follows federal standards and is responsible for much of the metadata collection, quality control, and dissemination.
How FAIR Is GCIS?
Making well-informed connections between each entity requires the adoption of metadata standards and principles like the findable, accessible, interoperable, and reusable (FAIR) guidelines. FAIR debuted in 2016 as a set of data principles designed and endorsed by a diverse set of stakeholders. These principles emphasize the machine readability of data or metadata, with the intention of making this information more findable, accessible, interoperable, and reusable with minimal human intervention.
GCIS emphasizes the findability and reusability aspects of the FAIR standards. The research methods of GCIS are focused on individual resources in the model. Each type of resource in GCIS follows an internally defined convention, informed by external standards and focused on establishing findability and reusability.
Making GCIS data and metadata findable involves assigning globally unique and persistent internal and external identifiers for each resource in the system. Findability also requires that data be described with complete and accurate rich metadata, and these metadata must clearly and explicitly include the identifier of the data they describe. Data and metadata are registered or indexed in a searchable resource. For simple and advanced searches, GCIS uses technology provided by search.gov.
Ensuring reusability requires that data and metadata are richly described with many accurate and relevant attributes. Metadata must be well researched and sourced, and they must be reliable. Data and metadata are released with a clear and accessible data usage license, which provides information on data usage and copyright restrictions. Every entity or resource within GCIS includes detailed documentation of the provenance of the data and metadata. GCIS data and metadata also follow domain-relevant federal and international data standards.
Consider the Source: Documenting Provenance
Ensuring that users know and trust information linked together from many sources requires that the information be reliable. As important as it is to create accurate metadata, it is also necessary to connect and interlink metadata records using a machine-readable vocabulary to make them more sustainable in the web of data. GCIS provides just such a linked open data platform for accessing climate data resources related to the National Climate Assessments.
Data provenance, or origin, tells users the story of the data with a relevant attribution of facts. Breaking down text reports into smaller objects and creating links between these entities and internal or external objects not only make the data more transparent but also inform users about how, why, and whether the outputs from these data are reproducible.
As a part of linking data to external trusted repositories, GCIS must be coordinated with other standardized registries (e.g., DOI, GCMD keywords, ISSN, ISBN, ORCiD, PubMed). The more connections with standardized registries and libraries the data have, the more they are considered to be reliable. Using identifiers from these standardized registries saves a lot of time in matching the data between two systems. For instance, using digital object identifiers (DOIs) in GCIS enables the importation of metadata for thousands of cited reference records in a matter of minutes.
Good provenance links should also inform users about outdated data sets or processes. For example, if a data set has not been used to create a figure since a prior national climate assessment, it may no longer be current. GCIS uses semantic vocabularies to provide users with knowledge about the relationship between current and outdated data sets.
GCIS publishes the processes and data creation methods used to create graphical images in the climate assessments. For example, GCIS defines an “activity” as a process used to create a graphic or image from a data set.
GCIS relies on the NCICS TSU for the accuracy and completeness of much of the NCA metadata. The TSU metadata survey system collects federally mandated information on the processes used to create an image from a data set. The TSU meticulously QC’s this information, working closely with the scientists or authors of the resource, before depositing into GCIS. The TSU also disseminates this information through the web-based NCA reports through a user-friendly metadata viewer.
GCIS puts extensive effort into handling research data to comply with these provenance standards. GCIS mainly focuses on improvements and enhancements by filling gaps in GCIS data and increasing the traceability, accessibility, reusability, and reproducibility of the system. To find data gaps, GCIS assesses itself using its well-curated scoring scheme, which is informed by the standards of the individual resources as well as by the connectivity of these resources with one another.
For example, a “person” entity in GCIS—in other words a contributor such as an author or editor of a report—is considered a “very good” resource if certain information fields are provided. These include first and last names, a persistent identifier like ORCiD, and the URL of an external biography. By evaluating scores of individual resources in GCIS, it is possible to narrow the focus of research and curation, ensuring that the data are FAIR and linked.
In the past 3 years, the process and standards under which GCIS operates throughout its data life cycle have steadily improved. For example, during the data collection process, the NCICS TSU requires contributors to the report to answer a series of questions in order to provide an accurate provenance for the resource to be accepted for inclusion in the final report. The curation and research methods used in meeting FAIR data standards have also been updated. The metadata scoring scheme described above also helps reveal where our regular data collection and curation processes are falling short.
Improvements in GCIS are usually based on information gathered about evolving user needs, and the GCIS development team communicates with stakeholders to plan those improvements. For example, in a field as fast moving as climate change research, users may wish to re-create graphical aids with a newer set of data or custom parameters. GCIS empowers users to do so by linking the sources of data to provide them with avenues to explore the research further. These metadata links allow users to see the pathway from data source to final product.
Connecting scientists, data creators, data providers, and data curators is essential to communicating the science of climate change effectively. By continuously evaluating and revising our processes, the USGCRP and the NOAA TSU have achieved improved traceability within the metadata cycle from the scientists to the users with each subsequent climate assessment.
A Sustainable Data Resource
The process guidance for the Third National Climate Assessment emphasized the importance of improving methods to assess confidence and uncertainty in scientific information for decision-making. It is important to create and maintain a platform and a system that support maximum traceability, accessibility, and usability, but that is not enough. It is also important to manage the metadata for climate assessments with reliable and sustainable standards and curation methods in a timely manner to enhance the credibility of the underlying data. Such metadata management assists in producing reliable climate assessments that reach a broader audience and more reliably inform decisions.
To make climate data more sustainable, GCIS is focusing on the continuous improvement of the metadata. GCIS is working with the scientific community toward setting high standards for documenting provenance and enabling reproducibility, with the aim of further improving GCIS and future National Climate Assessments.
Amrutha Elamparuthy ([email protected]) and Reid Sherman, Straughan Environmental, U.S. Global Change Research Program, Washington, D.C.