There’s a smartphone in nearly everyone’s pocket these days, and crowdsourced data are downright plentiful: photos, videos, and posts on Facebook or Twitter, to name just a few. Now, researchers have developed a new algorithm to pinpoint the geographic locations of natural disasters mentioned in tweets.
The method, which groups tweets together on the basis of the location likely being referred to in the message, can help emergency personnel gauge the magnitude of a disaster and react quickly to deliver relief services, its developers say. This algorithm, which works nearly in real time, complements traditional methods of investigation such as satellite observations, which can be impeded by cloud cover, smoke, or other obstructions.
“We need information about what is happening right now,” said Jeroen Aerts, a geographer at Vrije Universiteit Amsterdam (VU Amsterdam) and a member of the research team. “We have to rely on people who are actually in the area.”
“In a natural disaster, landlines might fail, a local cellular network may go down, but as long as a single connection to the internet remains, the disaster’s victims can get their calls for help out, ” said Christopher Brown, a linguist at The University of Texas at Austin who was not involved in the research.
Researchers have previously used Twitter data to investigate natural disasters. At the American Geophysical Union’s 2016 Fall Meeting, for example, scientists reported that Twitter posts accurately mapped the extent of flooding in Japan.
Unfortunately, geotagging tweets isn’t as simple as recording the latitude and longitude of the sender’s computer or smartphone. That’s because the Twitter feature of attaching GPS coordinates to a tweet is turned off by default, which results in fewer than 1 in 100 tweets having associated coordinate information. So Jens de Bruijn, a geographer at VU Amsterdam, and his team turned to more creative ways of inferring the geographic origins of tweets.
De Bruijn and his colleagues started by collecting tweets about floods because the researchers had expertise assessing flood risks worldwide. They gathered more than 35 million tweets published in 12 different languages between 2014 and 2017. From this database, the researchers extracted the roughly 11 million tweets that mentioned one or more flood locations.
The researchers then faced a challenge: locations noted in individual tweets were often ambiguous. Did “Flooding houses! #BostonFlood” refer to the capital of Massachusetts in the United States, the town in eastern England, or the municipality in the Philippines?
Metadata to the Rescue
To overcome this uncertainty, de Bruijn and his collaborators analyzed the tweets’ metadata, ancillary information that includes a user’s time zone, hometown, and, if available, GPS coordinates. The team used these data to further refine the location referred to in a particular tweet.
For instance, if a user with a hometown of Waltham, Mass.—a city near Boston, Mass.—tweeted “Flooding houses! #BostonFlood,” the researchers concluded that the flooding was occurring in the United States as opposed to England or the Philippines.
“It’s an exciting area of research,” said Aerts. “It combines the natural sciences and the social sciences.”
Unlike other geotagging algorithms that analyze tweets on an individual basis, the one developed by de Bruijn and his colleagues assigns locations to groups of tweets. It lumps together tweets that share a keyword tied to a particular location and that were published close together in time. This method made it possible to geotag more tweets than would have been possible considering each tweet on its own, according to the team.
“It’s a group effort: A single tweet can only convey 140 characters, but together with hundreds or thousands of other related tweets, the available information adds up to a more actionable sum,” explained Brown.
Putting the Method into Practice
Using a control set of thousands of tweets that one researcher had manually geotagged, the scientists showed that their algorithm automatically and correctly geotagged approximately 70% of tweets. That’s a roughly twofold increase over other geotagging methods that analyze tweets on an individual basis, the team reported.
This new algorithm can churn through millions of tweets in a matter of hours, and de Bruijn and his team are working on how to best share their findings. “We want to see how we can use [this algorithm] in practice,” said Aerts. “We’ve been in contact with the Red Cross and the World Bank.”