Machine learning (ML) is ubiquitous in modern life—it’s the engine driving technologies like social networks, fraud detection, text translation, and speech recognition. Broadly speaking, ML is a branch of artificial intelligence that deals with designing algorithms that “learn from data.” The tasks tackled by ML algorithms are traditionally divided into three categories: classification (assigning a datum to a given class or category), regression (predicting a continuous value for an observable), and dimensionality reduction (finding relationships among variables).
In practice, ML is particularly appealing when a data set is highly dimensional—hence hard to process with traditional statistical methods—or is so complex that human experts have limited insight. Learning can either be “supervised,” when the algorithm sifts through a large number of examples for which the answer is known, or “unsupervised,” when the algorithm infers patterns and relationships underlying the data set. Today’s ML renaissance resulted from the combination of large data sets—which are getting significantly bigger over time—and more powerful computers, which can analyze these large data sets.
ML is expected to play an increasingly important role in scientific fields where data are pivotal. For example, last December NASA announced that Kepler space telescope data were used to discover new exoplanets with the help of deep learning, a supervised classification problem solved using a massive parallel computer architecture that mimics the human brain, that is, neural networks.
Many in the space science community anticipate that ML will have a profound effect on heliospheric physics in the foreseeable future. Space missions in the past few decades have returned large amounts of data encompassing remote, in situ, and ground-based observations. Space physics and space weather offer a tremendous opportunity to employ ML techniques that can disentangle highly dimensional data and detect patterns and causal relationships in complex nonlinear systems. To utilize these techniques to their fullest extent, however, space physicists need to be familiar with the language and tools of ML. Thus, a need for interdisciplinary collaborations has emerged.
To foster symbiosis and cross-fertilization across disciplines, a workshop brought together researchers from space weather, space physics, computer science, information science, ML, and data mining. The workshop was limited to about 40 participants to ensure personal interactions; the time was divided between lectures by topical experts and small breakout sessions. The lecture topics were balanced between space weather and various aspects and subfields of ML. Breakout sessions initially focused on identifying challenging problems in the solar and magnetospheric environment and subsequently involved applications of ML algorithms. The workshop was successful in creating lively discussions and a collaborative environment.
The following open challenges were addressed by the participants:
- understanding causality and reducing dimensionality in space data (remote, in situ, and ground based)
- dealing with large imbalances in space weather data (e.g., events and nonevents) to train forecasting models
- producing large catalogs of events with ML algorithms
Many joint projects were discussed, and participants are expected to continue engaging after the workshop. A planned white paper geared toward funding agencies will raise awareness of the potential for ML applications in space weather. Finally, a coordinating committee was formed to prepare a general-public ML challenge in the upcoming months in the style of Kaggle.com, a Web-based platform for predictive modeling and data science competitions. A follow-up workshop is anticipated to take place in 2 years.
Workshop materials are available here.
We gratefully acknowledge the Lorentz Center for facilitating the workshop and financial support. We would also like to gratefully acknowledge financial support from the Institut National de Recherch en Informatique et en Automatique, Royal Netherlands Meteorological Institute, and Netherlands Organization for Scientific Research cluster Nonlinear Dynamics in Natural Systems.
—Enrico Camporeale (email: [email protected]), Multiscale Dynamics, Centrum Wiskunde & Informatica, Amsterdam, Netherlands; Simon Wing, Johns Hopkins University Applied Physics Laboratory, Laurel, Md.; and Jay Johnson, Andrews University, Berrien Springs, Mich.