Ocean scientists will now find it easier to track deep-sea data from disparate sources: introducing SeaView, a new central home for ocean data that strings together five online databases.
The central idea of SeaView is to unify vast streams of data from different sets into one easily searchable set. The project, led by the Scripps Institution of Oceanography, is the ocean arm of a broader initiative called EarthCube. The latter effort is funded by the U.S. National Science Foundation to design and develop the cyberinfrastructure—information systems, databases, software, and tools—needed to support Earth and planetary sciences in the coming decades.
Making data from various collections searchable under the same platform will help researchers who seek to transform data into knowledge. “We’re trying to reduce the time to science,” said Steve Diggs, data curator at the Scripps Institution of Oceanography.
Diggs and other scientists from Scripps presented a poster that gave an overview of SeaView on 14 December at the American Geophysical Union’s Fall Meeting in San Francisco, Calif. Earlier that day, SeaView officially launched online.
What If All Data Could Speak the Same Language?
SeaView pulls data from five databases: Rolling Deck to Repository (R2R), Ocean Observatories Initiative, Biological and Chemical Oceanography Data Management Office (BCO-DMO), Ocean Biogeographic Information System (OBIS), and CLIVAR and Carbon Hydrographic Data Office (CCHDO). Together they encompass data on biological, chemical, physical, and geological properties of the majority of the world’s oceans.
To unify data from the different sources, researchers at SeaView first sought to understand why data have varied formats. Most of these sources are collections of data taken at sea by research cruises, moored sensors, and ocean gliders. Formatting differences in data strike at the nature of the research itself.
“Ships are interesting; they have different scientists on board who have different methods,” said Diggs. “Right when they capture the data, they start to look different.” The same holds for data from moorings and gliders—differences in project design often mean that similar data get formatted differently from project to project.
Further, once data are split off to be housed in different repositories, there’s no way to bring them back together, explained Karen Stocks, director of the Geological Data Center at Scripps and the lead author of the poster. To help get around this, her group is now implementing standard identifiers, so that metadata from the same cruise or mooring network all get the same code. This will help data users down the line better understand where information came from.
Building SeaView has helped the team understand what scientists need to do differently to make source data more easily integrated. They’re now bringing that wisdom back to the original data warehouses so that they can start to standardize approaches to collecting data. Such consistency will help future data integration, which will ultimately help scientists find more patterns in the data, Stocks noted.
Data Themes: The Ocean North of Hawaii and the Gulf Stream
In addition to integrating data from the five online catalogs, researchers at SeaView are curating information for specific regions.
“We’re not becoming a new data center that just takes the data from everybody else,” said Stocks. “But we’re developing thematic collections about particular kinds of science that researchers are doing.”
The team, working directly with ocean scientists to understand their priorities through a series of focus groups and workshops, identified two scientific areas of highest interest. One is in the Pacific Ocean north of Oahu, Hawaii, and the other is off the U.S. east coast, within the Gulf Stream.
“When we were selecting sites, we looked for places where there was a lot of science interest and a large enough set of data across the different repositories,” Stocks explained. The team then collected data from each region into a unified set, so that “when you put it together, the whole may be greater than the sum of the parts.”
Active, multidisciplinary ocean research is going on in both regions. The Hawaiian collection, for example, contains data from 266 cruises at BCO-DMO, 255 cruises at CCHDO, microbial data from the OBIS package, and metadata from 278 cruises at R2R. The Gulf Stream collection contains a similar volume of data.
The two specialized collections contain data that scientists want to be integrated right now, Diggs explained. “There are many scientists developing new proposals to do new work” in the regions, Stocks added. “Having a broad set of accessible, usable data will help them develop new science ideas,” she said.
Putting SeaView to Work
Efforts such as SeaView will help scientists effectively share data across disciplines, explained Stace Beaulieu, biological oceanographer at Woods Hole Oceanographic Institution. “Success is when we can integrate our own resources with others and get a better understanding of, in my case, marine ecosystems,” she said.
Beaulieu noted that the next challenge would be to effectively communicate SeaView output, so that scientists across disciplines can become aware of the integration happening at the repository level.
Ultimately, the goal is to render SeaView obsolete. “SeaView making these [integrations] for you shouldn’t have to happen,” Stocks said. “The repository should have everything clean and integrated to begin with.”
Until then, however, SeaView is on hand to fill the gap.