Artificial intelligence (AI) methods have emerged as useful tools in many Earth science domains (e.g., climate models, weather prediction, hydrology, space weather, and solid Earth). AI methods are being used for tasks of prediction, anomaly detection, event classification, and onboard decision-making on satellites, and they could potentially provide high-speed alternatives for representing subgrid processes in climate models [Rasp et al., 2018; Brenowitz and Bretherton, 2019].
Although the use of AI methods has spiked dramatically in recent years, we caution that their use in Earth science should be approached with vigilance and accompanied by the development of best practices for their use. Without best practices, inappropriate use of these methods might lead to “bad science,” which could create a general backlash in the Earth science community against the use of AI methods. Such a backlash would be unfortunate because AI has much to offer Earth scientists, helping them sift through and gain new knowledge from ever-increasing amounts of data. Thus, it is time for the Earth science community to develop thoughtful approaches for the use of AI.
Easy Access to Powerful New Methods
Setting up and running experiments with AI methods used to require sophisticated computer science knowledge. This is no longer the case. The recent success of AI in other domains has spurred the development of free and highly efficient software packages that are extremely easy to learn and use. Even complex artificial neural networks can be set up in a few lines of code, and countless tutorials and examples are available to guide the novice user. Furthermore, as algorithms become more efficient and computational power gets cheaper and more available on the cloud, access to high-performance computing is no longer a limiting factor. All of these developments bring powerful AI methods to Earth scientists’ fingertips.
Earth scientists have a long tradition of using methods based on physics (e.g., dynamical models) and sophisticated statistics (e.g., empirical orthogonal function analysis and spectral analysis). They have thus accepted statistical methods, which are a type of data-driven method, as useful tools. However, the sudden rise of AI methods—another type of data-driven method—in Earth science, coupled with a terminology and culture unfamiliar to Earth scientists, may make AI methods seem more foreign than they actually are. AI simply provides an extended set of new data-driven methods, many of which are derived from statistical principles. For example, one basic type of artificial neural network (deep learning) is essentially a linked series of linear regression models interspersed with scalar nonlinear transformations.
We address here the question of how best to leverage both physics-based and data-driven methods simultaneously by outlining several proposed steps for researchers. For brevity we use only the term “AI methods” below, although most of our discussion applies equally to all data-driven methods.
Step 1: Ask Guiding Questions
We suggest that Earth scientists ask themselves the following questions before choosing a specific AI approach:
- Why exactly do I want to use AI for my application? Is this application for prediction, understanding, or both? The answer is important for choosing an AI method that satisfies the desired trade-off between transparency and performance.
- How can scientific knowledge be integrated into the AI method? There are many ways to combine expert scientific knowledge of underlying physical processes (e.g., physics and chemistry) and AI methods; every effort should be made to merge these approaches, as discussed more below in step 2.
- Which tools from explainable AI are available? The emerging field of explainable AI (XAI) provides many new tools for the visualization and interpretation of AI methods [Samek et al., 2019]. McGovern et al. , for example, show the enormous potential of these tools for weather-related applications. These tools have the potential to transform the use of AI methods in Earth science by increasing transparency and thus building trust in their reasoning.
- Does my approach generalize to address all conditions in which it will be used? AI methods rely on “training data” to learn the characteristics of a system. Special attention must be paid to testing and ensuring that the resulting AI model works under changing conditions, including regime shifts. Generalization can be greatly enhanced by fusing scientific knowledge and can be tested by methods ranging from cross validation to the AI technique of generating adversarial examples.
- Is my approach reproducible? Am I following the findable, accessible, interoperable, and reusable (FAIR) data principles? Is my method easily accessible to the community?
- What am I hoping to learn in terms of science? Because of the complexity and abstract nature of many AI techniques, the answer to this question is crucial to set up the problem and method so that there is something to learn from the AI approach and to be able to interpret the results. The availability of explainable AI tools discussed above creates many novel opportunities to gain new scientific insights into Earth science processes (E. A. Barnes et al., Viewing forced climate patterns through an AI lens, submitted to Geophysical Research Letters, 2019).
We encourage researchers to thoroughly reflect on these questions to select the best AI method for their application. Furthermore, to promote substantial advances in scientific research, editors of Earth science journals may need to create guidelines for the review of AI-focused manuscripts to ensure that findings are explained clearly and placed in the context of existing Earth science. Likewise, editors might encourage comparison to standard or simpler approaches to discern the scientific advances that AI offers.
Step 2: Explore Fusing Scientific Knowledge into AI
The field of theory-guided data science investigates ways in which AI and scientific knowledge can be combined into hybrid algorithms that incorporate the best of both worlds [Karpatne et al., 2017]. Earth science applications have several properties that make it imperative to integrate scientific knowledge as much as possible, such as
- the desire of Earth scientists to gain scientific insights rather than just “get numbers” from an algorithm
- the availability of extensive existing knowledge
- the high complexity of the Earth system
- the limited sample size and lack of reliable labels in many Earth science applications
Integrating scientific knowledge into AI approaches greatly improves transparency because the more scientific knowledge is used, the easier it is to follow the reasoning of the algorithm. Generalization, robustness, and performance are also improved because the scientific knowledge can fill many gaps left by small sample sizes, as explained below.
How can we integrate scientific knowledge? One excellent practice uses a two-step approach. For a given task, first identify all subtasks that can easily and efficiently be addressed by physics-driven methods, and apply those. Although this guideline may appear obvious, it is often overlooked, likely because applying an off-the-shelf AI algorithm to the entire task appears to require less work than carefully analyzing and addressing several subtasks.
For remaining subtasks, consider using AI algorithms while still leveraging scientific knowledge. Most AI methods have some kind of optimization procedure at their core; that is, they search for an optimal model over a large parameter space. Without using scientific knowledge, the search space is often large, and many solutions may exist, only some of which might be physically meaningful. By leveraging known physical relationships (e.g., by including them as constraints in the optimization problem), the optimization is guided toward only physically meaningful solutions (T. Beucler et al., Enforcing analytic constraints in neural-networks emulating physical systems, arXiv:1909.00912). This approach offers additional benefits: Convergence tends to be faster because of the smaller search space, and the resulting method tends to generalize better because of the use of established physical relationships. Thus, although this approach requires more work at the onset, it tends to result in much better overall solutions.
Step 3: Foster Interdisciplinary Collaboration and Education
Innovative approaches, such as fusing scientific knowledge and AI methods, require deep knowledge integration across disciplinary boundaries [Pennington, 2015]. This integration is best achieved by close collaboration between Earth scientists and AI researchers (see guidelines for such collaborations). The Association for Interdisciplinary Studies organizes conferences and provides general strategies for effective interdisciplinary collaboration.
It might not always be possible to find suitable collaborators, so one option is to join learning communities, such as the National Science Foundation–sponsored EarthCube Research Coordination Network IS-GEO: Intelligent Systems Research to Support Geosciences. The increasing number of sessions (e.g., coordinated by AGU’s Earth and Space Science Informatics section), workshops (e.g., Climate Informatics), and conferences (e.g., the American Meteorological Society’s Conference on Artificial Intelligence for Environmental Science) dedicated to AI research in Earth science is encouraging, yet there is still a large need for additional events that engage Earth science and AI researchers simultaneously and build bridges between these communities.
Furthermore, there is a tremendous need to develop guidelines and best practices to educate the future Earth science workforce to be well prepared for innovative, interdisciplinary research bridging Earth science and AI. Numerous institutions are starting to incorporate data science and AI courses into their curricula (e.g., Cornell’s Institute for Computational Sustainability and National Science Foundation Research Traineeship programs at the University of Chicago, University of California, Berkeley, and Northwestern University). The community can support these efforts by collecting and disseminating educational resources and by developing guidelines on which topics are most beneficial to integrate into Earth science education [Pennington et al., 2019].
The availability of AI methods provides many new and exciting research avenues for Earth scientists, but it also requires the community to reflect on when and how these methods should be used because using such methods without careful consideration can lead to bad science. The two most promising safeguards against such bad science are the integration of scientific knowledge into AI methods and the use of visualization tools to maximize their transparency.
Support was provided by National Science Foundation grants AGS-1749261 (E.A.B.) and AGS-1445978 (I.E.).