A graphical assessment of solar energetic particle forecasts
A graphical assessment of solar energetic particle forecasts produced by the Proton Prediction System developed at the US Air Force Research Laboratory, in this case forecasts based on observed solar flare X-ray peak intensity. The yellow histogram bins indicate events that were forecasted to exceed the nominal event threshold of 10 proton flux units (pfu) as marked by the dashed line, whilst blue bins indicate events not so forecasted. The yellow bins to the left of the dashed line are one example of the challenge addressed by this manuscript. Should such forecasts, with an outcome falling just below the event threshold, be assessed as false alarms, or do we need more nuanced assessments that recognize these as near hits? Credit: Kahler and Darsey [2021], Figure 1
Source: Space Weather

As space weather forecasting matures more attention is being given to how we verify and validate those forecasts. A simple approach is to classify a set of forecasts of space weather events as: (a) the forecast event occurs (a hit), (b) does not occur (a false alarm), (c) does occur but is not forecast (a miss), or (d) no event is forecast and none occurs. The counts of forecasts in each class are then formed into a contingency table from which a wide range of metrics can be derived (example here). This simple approach is sensitive to any quantitative thresholds used to identify an event; for example, does the particle flux exceed some level. If an event falls 1 per cent below the threshold it will be counted as false alarm. Thus, we need more nuanced metrics that take account of this and other cases where thresholds are narrowly missed.

Kahler and Darsey [2021] directly address this problem by exploring how the counts in a contingency table might be weighted to take account of event sizes. The authors apply their ideas in the context of solar energetic particle event forecasts driven by observed intensities of solar flares and solar radio bursts.

Their results suggest that use of weighted counts may better assess forecasts of frequent small events close to event threshold, less so for the prediction of occasional large events. Thus, an event peaking just below threshold might be recognized as a hit with low weight, a more realistic assessment than a false alarm.

This paper shows that this is an important issue for further work, as the authors note, in particular to develop more robust approaches to the verification and validation of forecasts. Such work is timely, not only as space weather forecasting matures, but also to take account of recent advances in the verification of meteorological forecasts, such as the flexing approach presented by Sharpe [2016].

Citation: Kahler, S. W., & Darsey, H. [2021]. Exploring contingency skill scores based on event sizes. Space Weather, 19, e2020SW002604. https://doi.org/10.1029/2020SW002604

―Michael A. Hapgood, Editor, Space Weather

Text © 2021. The authors. CC BY-NC-ND 3.0
Except where otherwise noted, images are subject to copyright. Any reuse without express permission from the copyright owner is prohibited.