The 20th century statistician George Box is widely credited with the remark that “essentially all [statistical] models are wrong, but some are useful.” And it’s true: As abstractions of the real world, models can only generalize the systems we study. The same holds for remotely sensed data, often collected via satellite or aircraft.
One way that scientists reconcile statistical models with remotely sensed observations is through data assimilation. In short, data assimilation continuously compares newly received data with an existing model, which is then updated to reflect the new contributions. The hydrologic sciences commonly use data assimilation to evaluate soil moisture from satellite imagery; nearly half of the participants in NASA’s Soil Moisture Active Passive (SMAP) Early Adopter Program, for example, use the method to extract information from the mission’s remote sensing products.
Uncertainty rears its head everywhere in this type of analysis: in the data assimilation algorithm, in the Earth observation data, and in the field (in situ) data used to validate the dynamic model. But just how much uncertainty exists in any given scenario is still unclear.
In a new study, Nearing et al. tackled data assimilation efficiency and outlined a framework to measure the amount of information lost in dynamic models. As an example, they fed remotely sensed soil moisture data from the Land Parameter Retrieval Model into the Noah Multiparameterization land surface climate model. They then compared the model’s estimates of volumetric soil water content to in situ data from the U.S. Department of Agriculture’s Soil Climate Analysis Network (SCAN).
In the soil moisture experiment, a couple of points stood out. First, the model and assimilated data combined explained less than 20% of the information contained in the field data; the remote sensing data alone captured only 5% of the variability in the SCAN data. This inefficiency underscored the fact that remotely sensed data only slightly improve the point data already available through SCAN. Second, the data assimilation algorithm used just 5% of the information that was available from the remote sensing instrument. This result suggests that almost none of the assimilated remote sensing data informed the land surface model.
The primary purpose of the research was not to highlight the limitations of data assimilation, however, but rather to demonstrate the applicability of the evaluation metrics. In this case, it worked: The authors demonstrated that their process was inefficient. The broadly applicable procedure that they outlined can be applied to a variety of geophysical problems and highlights where statistical models can be improved. Whether it’s revamping sensor technology or tinkering with the data assimilation algorithm, the presented framework can help future researchers reveal the inefficiencies in their work. (Water Resources Research, https://doi.org/10.1029/2017WR020991, 2018)
—Aaron Sidder, Freelance Writer