Source: Geophysical Research Letters
Climate and ocean models use a series of equations to represent complex natural processes. However, the equations used in these models are often arrived at on the basis of limited observations and series of assumptions.
Machine learning could be a powerful approach for analyzing data to directly “discover” the equations that underlie large, complex systems such as the changing biogeochemistry of the ocean. Machine learning has yet to be fully tested for equation discovery, however, meaning its capabilities and potential shortcomings aren’t fully known.
To better understand machine learning’s applicability for equation discovery, Wang et al. turned to an ocean biogeochemical model, seeing whether machine learning could re-create known equations governing colloidal iron (a key part of the ocean iron cycle) from a relatively sparse dataset. The machine learning technique discovered equations that performed comparably to the original equations used in the model while simultaneously uncovering new information about the underlying datasets and the iron cycle in general. This result is a step toward validating the use of equation discovery for other similarly complex processes in the real world, according to the authors.
The authors used a kind of equation discovery called symbolic regression, which asks a machine learning model to begin with mathematical operators and from there discover optimal equations for a particular dataset. With this approach, the authors derived a suite of six equations that described how colloidal iron, which consists of microscopic suspended iron particles, behaves in the oceans. The equations discovered via symbolic regression differed from the known equations but are functionally simpler and produce large-scale patterns equally well, the authors say.
The equations also contain new insights into iron cycling: For example, they do not include salinity, likely because that variable does not change much throughout the ocean. The equations additionally show that full–water column sampling approaches produce better results than those taken from specific depths, helping to guide future sampling work. Finally, the authors also found that the equations discovered from sparse datasets can be robust if colloidal iron data are measured where existing dissolved iron samples have been taken.
This gap highlights a need for future sampling to capture colloidal iron data throughout the water column and to focus on expanding coverage of undersampled ocean basins, they argue. Scientists with unpublished iron speciation data from GEOTRACES cruises can help this effort by sharing their data, they add. (Geophysical Research Letters, https://doi.org/10.1029/2025GL121380, 2026)
—Nathaniel Scharping (@nathanielscharp), Science Writer

