It was the summer of 2011, and the Earth Science Information Partners’ (ESIP) meeting had just broken for lunch. Everyone was wandering through the dining area looking for a seat.
Unfortunately for him, Greg Leptoukh chose the empty seat next to me. At the time, I was finishing a Ph.D. in scientific computing and was validating a new method with an atmospheric science application. I had the computing part down but was a little uncertain about the atmospheric physics. Greg was an expert in atmospheric physics.
For the next 30 minutes Greg patiently listened to my ideas, corrected misunderstandings, and brilliantly suggested new paths to pursue. Greg even invited me to visit him a few months later when he learned I’d be at his institution, NASA’s Goddard Space Flight Center in Maryland. That was the type of person Greg was—an expert in his field who always had time and support for colleagues and students.
Greg was also a pioneer in informatics: a field that combines computing with specific domains of science. Over the past several decades, computing has become an inextricable part of the Earth and space sciences, leading to significant increases in productivity [Narock and Fox, 2012]. Yet the world of data can be a confusing place. Varying data formats, immense downloads, scientific visualizations, cloud computing, and data science present challenges even for experienced researchers. The field of informatics, with a breadth that encompasses computer science, information technology, human-computer interaction, and statistics, needs a common infrastructure and methodologies that enhance scientific work in the digital age.
Greg embodied the search for that common infrastructure through open collaborations. He embraced a nascent field and inspired others with his passion for evolving issues of data management. Those who capture that same spirit have been honored, each year since his death, with presenting the Leptoukh Lecture at AGU’s Fall Meeting. The lecture gives the geoscience community as a whole the chance to identify and support achievements in computational and data sciences.
Big Ideas for Big Data
Greg recognized early on that the Web could be used to search and analyze distributed heterogeneous data sets. He became one of the principal architects of Giovanni: the Goddard Earth Sciences Data and Information Services Center’s Interactive Online Visualization and Analysis Infrastructure [Acker and Leptoukh, 2007]. Giovanni seamlessly integrates NASA Earth science satellite data that scientists can access through a Web browser–based interface (an example is shown in Figure 1).
Researchers can use the system to analyze a wide range of phenomena. In the 14 years since its launch, Giovanni has been instrumental in producing over 1,000 scientific papers, such as a study on coral bleaching [Miranda et al., 2013], a study investigating phytoplankton variability [Houliez et al., 2013], and a study on carbon dioxide flux in the northeast Atlantic [Jiang et al., 2013].
In 2006 AGU recognized the issues that large-scale computing posed by forming the Earth and Space Science Informatics (ESSI) focus group. ESSI soon became a permanent AGU section, with the ultimate goal of evolving “data systems into knowledge systems that support the range of Earth and space science interests.”
Pushing the Boundaries of Informatics
Greg was an active member of the ESSI community, both at AGU and within the European Geosciences Union, where he researched data quality and reproducible science. Sadly, we lost Greg Leptoukh much too early when he passed away in 2012.
Many of us in the ESSI community wanted to recognize Greg’s scientific contributions and endearing personality, so we approached AGU about dedicating an annual lecture focused on data and computation. The inaugural Leptoukh Lecture was held in December of 2012 at AGU’s Fall Meeting, given by Chris Lynnes, a close friend and colleague at NASA’s Earth Science Data and Information System Project, who offered a remembrance of Greg and his work.
Every year since, AGU attendees have gathered to hear about advances in computation, instrumentation, and data handling, as well the accomplishments of individual scientists. The Leptoukh Lecture “aims to raise awareness of the often-overlooked computational and data advances that enable breakthroughs in science.”
In 2016, Cynthia Chandler of the Woods Hole Oceanographic Institution described her experience with marine ecosystem research data and the challenges and strategies she’s learned about stewardship of large and complex data sets. In 2013, Simon Cox, a geophysicist at the Commonwealth Scientific and Industrial Research Organisation, illustrated how informatics can impact fields across the spectrum. The standardization of tools and methodologies he helped develop are now being used across several environmental fields and even air traffic control.
As we contend with the human impacts of climate change, innovations in computational and data science are facilitating adaptation. Digital tools are helping communities monitor air quality and drought, find available drinking water, and determine habitat vulnerability. Dawn Wright of the Environmental Systems Research Institute presented the notion of digital resilience in her 2015 lecture: If digital tools are to continue helping communities, those tools must be built with the capacity to deal effectively with the threats from rapidly changing environments.
The grand challenges of climate science severely stress our computational infrastructure. New remote sensing and in situ techniques coupled with a desire for ever-greater simulation resolution present significant problems in computation and data handling. In 2014, Leptoukh lecturer Bryan Lawrence of the U.K. National Centre for Atmospheric Science pointed out that the complex worldwide climate simulations projects, Coupled Model Intercomparison Project Phase 5 (CMIP5) and Phase 6 (CMIP6), progressed in large part because of advances in environmental informatics.
Although some informatics techniques are increasing the quality of the output, advances in data collection technology have increased the quality of the input, leading to the 2017 lecture topic from Kirk Martinez of the University of Southampton. He noted how miniaturization, multisensor integration, and increasingly powerful downlink systems are just a few advances that have contributed significantly to the field of environmental observation networks. Today’s instruments have efficient energy management and can withstand harsh environments and remote locations, such as glaciers. Informatics advances led to the first subglacial sensor probes with custom electronics and protocols. Sensor systems in the mountains of Scotland have demonstrated complete Internet and Web integration. These new data streams are advancing climate change research and have positively affected the domain of cryospheric science.
These past lectures show the innovation and creativity of the field of informatics and highlight the significant progress being made as Earth and space scientists advance through the digital age. In today’s world, issues of data management, large-scale computation, and modeling affect each of AGU’s sections. The Leptoukh Lecture, along with the ESSI section, help foster the solutions to these issues, enabling transparent and reproducible Earth and space science.
Join the Discussion at the 2018 Fall Meeting
At ESIP, where I first met Greg, they have a slogan of “making data matter.” Data do matter, and in this increasingly digital world so too does the technical infrastructure we build on top of them. The Leptoukh Lecture aims to raise the awareness of AGU members of this technical infrastructure and the scientific breakthroughs it has enabled.
One of the exceptional individuals who have contributed to informatics and data science is Ben Evans, associate director of research engagement and initiatives at NCI Australia, who will be presenting the 2018 Leptoukh Lecture. Ben’s presentation, “Evolving Data-Driven Science: The Unprecedented Coherence of Big Data, HPC, and Informatics, and Crossing the Next Chasm,” will address best practices in data management, data quality, and FAIR (Findable, Accessible, Interoperable, and Reusable) principles.
I hope you’ll join me at Fall Meeting 2018 on Wednesday, 12 December, at 10:20 a.m. to recognize his team’s accomplishments, and I hope you’ll join all of us in ESSI in continuing to recognize computational and data advances that affect the whole of the Union.
The author is indebted to Peter Fox and Mark Parsons of Rensselaer Polytechnic Institute as well as Karen Moe (NASA Goddard Space Flight Center) for several helpful discussions and comments during the writing of this article.