For many years, geophysicists have been constructing a picture of the ever-changing interior structure of Earth beneath our feet by observing how various types of waves (including seismic waves and electromagnetic waves) travel through solid rock, melts, and fluids within these materials. More recently, they have begun studying electromagnetic waves to find how Earth’s electrical conductivity affects the magnitudes and directions of geomagnetically induced currents in electrical power grid systems, which can cause electrical blackouts.
Magnetotellurics (MT) is a field of geophysical research that employs electromagnetic waves that diffuse through Earth to make inferences about Earth’s structure. These oscillations of electrical and magnetic fields are of the same nature as the visible light waves, microwaves, and radio waves that we’re familiar with in our everyday lives, but they have a much lower frequency (longer wavelength) than these more familiar wave types.
MT, a younger sibling of seismology, bases these inferences on ground-level measurements and physics-based modeling of electromagnetic waves from large-scale natural sources: solar wind and lightning [e.g., Chave and Jones, 2012]. The electrical conductivity of Earth’s crust and mantle is a parameter complementary to seismic wave velocities, and it is an indicator of what fraction of the material in a wave’s path is partially melted and whether solute-bearing fluids (such as water and carbon dioxide) are present. The MT method is also increasingly relevant to space weather applications, which is where Earth’s electrical conductivity and its effects on geomagnetically induced currents factor in.
However, getting access to historical MT data is difficult. Over the years, tight connections with the oil and gas, mineral exploration, and geothermal industries and the wide variety of historical data formats have not been supportive of open data sharing practices.
Another stumbling block for data sharing has been the lack of a means to assign credit for data collection and processing, which has been especially problematic for early-career researchers. A data set could not be cited in a publication or otherwise formally credited to its authors, and it was not easy to track the usage of the data.
Here we describe efforts that aim to address these problems in meaningful ways.
A Boost for the MT Field
Magnetotellurics in the United States received a significant boost in 2006 when the National Science Foundation funded USArray MT, the MT component of the nationwide EarthScope geoscience data collection effort [Schultz et al., 2006–2018]. Even though the MT component has a much smaller budget than the seismic component, the program has nevertheless been able to collect long-period data of an unprecedented quality and coverage, for the first time penetrating deep into the upper mantle.
We have designed a more uniform, sharable data format for the USArray MT data, recent and historical, and several years ago we expanded our efforts to archive MT data contributions from the wider community. This effort has resulted in the Incorporated Research Institutions for Seismology Electromagnetic Transfer Functions (IRIS EMTF), the first online system for long-term storage, discovery, and sharing of electromagnetic transfer functions worldwide.
Magnetotellurics Comes of Age
The first comprehensive academic treatises on the MT method appeared in the early 1950s, developed simultaneously and independently in France, Japan, and the USSR. However, it wasn’t until the 1970s that the method achieved widespread practical use.
Like the rest of the natural sciences, the MT method has undergone dramatic transformation in the digital age. Robust data analysis techniques had been mostly developed by the end of the 1980s, but the availability of interpretation methods determined what information types were stored for posterity. For example, in the 1980s it was common practice to apply ad hoc corrections to data to ensure compatibility with a state-of-the-art, one-dimensional interpretation (i.e., conductivity as a function of depth).
In the 1990s, the data were most commonly obtained along dense profiles and rotated to assumed strike to facilitate two-dimensional interpretation (conductivity versus depth and strike). These days, we are firmly grounded in three-dimensional interpretation methods. We’re only now reaching the stage where our computational resources allow creative combinations of multiple data sets to allow for large-scale, high-resolution geophysical inversions for mapping ground structure, so the time is now ripe for reinterpretation of some of the historical data sets.
Keeping Data Safe (in a Drawer)
The USArray MT data set currently covers about two thirds of the continental United States (Figure 1) at a quasi-regular 70-kilometer spacing. An important feature of the USArray MT project was the requirement that both the raw and processed data must be immediately available for public use.
Our responsibilities at Oregon State University included making the newly collected USArray MT data accessible to the public in a sustainable and searchable manner. We were also interested in locating and obtaining any historical MT data that might complement this new information.
We quickly learned that traditional MT data formats for storage and sharing of these important data sets were varied and rather fluid. No versatile, freely available readers were available; instead, a great number of home-brewed reading and writing tools proliferated.
All historical data were missing certain pieces of critical information (such as orientation and even geographic location) necessary to interpret the data in a way that might be different from what the original project intended. The only path to obtaining a data set was to track down the data author and ask nicely—a practice that typically worked well for old data if the tapes that used to store the files hadn’t been lost over the years. For newer data, the authors were reluctant to share. Overall, there was no culture of data sharing in the MT community in the United States, let alone internationally.
To overcome this, we saw the need to build a self-descriptive data format and a sharing platform for MT data. This need was particularly pressing for the final products of magnetotelluric and related electromagnetic data processing, which are called electromagnetic transfer functions (EMTFs). These data are critical for MT interpretation and are therefore truly precious. Fortunately, EMTFs do not require a large amount of computer memory, so we could focus on the completeness of the metadata rather than the efficiency of the information storage.
New Data Interchange Standard and a Searchable Public Database
To mitigate the problems related to heterogeneity of historical file formats and lack of critical metadata, we turned to the Extensible Markup Language (XML) specification. We designed a format of our own, EMTF XML, that is capable of storing the information from any EMTF data file without loss of content.
We have accompanied this development with open source conversion tools, which we wrote in Fortran 90. These conversion tools accommodate the complete variability of the historical file formats and convert them to the new EMTF XML data format (and back, as may be necessary to comply with the formerly established software tools of the MT community).
The most common historical format to date, the Society of Exploration Geophysicists (SEG) Data Interchange Standard 1987, also known as the EDI [Wight, 1988], came in many flavors, making it hard to homogenize the intricate EMTF metadata (Figure 2), such as processing details, precise position, and orientation. The new framework is an excellent fit to fully and concisely represent the EMTF data and metadata in a way that is consistent with long-term storage requirements, further interpretation (i.e., inversion), peer-to-peer sharing, and searchable databases.
After we designed the EMTF XML data format for the USArray MT data, we then solicited MT data contributions from the community (Figure 3). We shouldered the burden of the data format conversion and metadata collection to produce the first online system for long-term storage, discovery, and sharing of electromagnetic transfer function data worldwide (IRIS EMTF).
The database [Kelbert et al., 2011] is a result of collaboration between Oregon State University, the Geomagnetism Program at the U.S. Geological Survey (USGS), and the IRIS Data Management Center (DMC). The initial content of the database included hundreds of USArray MT data sites, but it has since been expanded to include contributions from historical data campaigns, restored from a variety of formats to homogeneous and readable XML files.
Credit Where Credit Is Due
Far from an afterthought, credit attribution is at the forefront of the challenge of data sharing. A lot of hard work goes into data collection and processing. The only traditional venue for credit attribution is through publications that make use of the data. Hence, researchers have been reluctant to give their data away “for free,” unless the data set has been exhausted academically. Oftentimes, by the time this is true, the data, or critical metadata, are already beyond recovery.
Most recently, some research communities have started to experiment with the idea of data citation. In this paradigm, a data set is considered a stand-alone scientific contribution, citable in publications in much the same way as a peer-reviewed paper would typically be cited. This provides a great incentive for data sharing and many side benefits, which include the ability to track the usage of any particular data set and notify the users of data updates.
We have opted to provide a unique digital object identifier (DOI) for each data survey: a collection of sites that shares common authors and a scientific goal. Thanks to the new procedure devised at USGS and implemented at the IRIS DMC, every data survey submitted to the IRIS EMTF database is attributed a complete citation, upon verification of metadata, and becomes a citable scientific contribution (Figure 4).
To make this possible, we reach out to the authors of every data set that we archive to obtain the critical information for the data citation, such as the authors, years of data collection, title, acknowledgements, and selected publications. We then create DOIs for each contribution.
We found that this is a learning process for everyone involved but that our colleagues appreciate the opportunity to receive credit for their work. Of note, the authorship of the data sets often turns out to be substantially different from that in related project publications.
Impact on Magnetotellurics and Other Disciplines
The new publicly accessible and searchable historical and modern MT database opens previously unexplored avenues for data exploration, survey design, inversion of geophysical data, Earth system understanding, and teaching. It has also turned into a primary resource for researchers in space weather and for the power grid industry, whose access to MT data is a prerequisite to developing accurate nationwide systems for geomagnetic hazard mitigation. We envision that this database will also be useful for researchers in other areas of geophysics, such as seismology, who would like to use real MT data for hands-on learning and joint inversion analysis (using processed seismic and EM data to obtain Earth structure properties consistent with both types of data).
The assignment of DOIs to data surveys upon submission to the database is an important leap in MT data sharing, providing authorship attribution and a data usage tracking capability. We hope that as the MT community warms up to the new data citation practices, this database will provide a new way for the authors to receive credit for the hard work of data collection. The database also provides a venue for release of historical commercial data for academic use.
A side benefit of this project is that our flexible, extensible, and self-descriptive data format, the EMTF XML, may now become the de facto standard in magnetotellurics, gradually displacing the EDI and other historical formats. The new format, which is complete and easy to use, allows for streamlined file exchange in the community, and the additional metadata are invaluable in the long term. We hope that this effort has helped to create a more open data sharing culture in the geosciences, one community at a time.
We gratefully acknowledge the open source Fortran XML (FoX) library developers, Toby White and Andrew Walker. We are grateful to Xavier Garcia for kindly sharing his EDI SPECTRA reading and rotation code with the authors. This code has served as an invaluable point of reference. The authors thank Gary D. Egbert, Adam Schultz, and the worldwide magnetotelluric community for valuable contributions toward EMTF XML format development and for making their data openly available. We are grateful to the IRIS Data Management System Data Product Development effort, whose funding and programming support have allowed the creation of the initial XML data format and database. The population of the database with historical data sets has been made possible through NSF award 1463855. We thank C. A. Finn, J. J. Love, J. McCarthy, and J. L. Slate for reviewing a draft of this article