Extracting Reliable Twitter Data for Flood Risk 1 Communication using Manual Assessment and 2 Google Vision API from Text and Images

11 While Twitter has been touted to provide up-to-date information about hazard events, the 12 reliability of tweets is still a concern. Our previous publication extracted relevant tweets containing 13 information about the 2013 Colorado flood event and its impacts. Using the relevant tweets, this 14 research further examined the reliability (accuracy and trueness) of the tweets by examining the text 15 and image content and comparing them to other publicly available data sources. Both manual 16 identification of text information and automated (Google Cloud Vision API) extraction of images 17 were implemented to balance accurate information verification and efficient processing time. The 18 results showed that both the text and images contained useful information about damaged/flooded 19 roads/street networks. This information will help emergency response coordination efforts and 20 informed allocation of resources when enough tweets contain geocoordinates or locations/venue 21 names. This research will help identify reliable crowdsourced risk information to enable near-real 22 time emergency response through better use of crowdsourced risk communication platforms.


25
Increased frequency and severity of climate-related hazards (e.g., floods, wildfires, hurricanes, 26 and heat weaves) and anthropogenic hazards (e.g., mass shooting, epidemics) have brought  social media platforms (e.g., Twitter), thereby, enriching and challenging traditional communication, 38 especially, during emergency management phases 6 . From the socio-psychological perspective, 39 reasons that generally drive people to share information on social media are self-efficacy, self-40 fulfilment, altruism, social engagement, reciprocity, and reputation 7,8 . Driven by these reasons, 41 numerous scenarios have used social media platforms to warn the public about disasters, report knowledge and data sets 15,16 . Although, both citizen science and crowdsourcing engage socio-48 culturally diverse and geographically dispersed citizens for data and knowledge creation/collection, 49 each has subtle differences 17,18 . While crowdsourcing remains an ill-defined approach that uses large 50 networks of people, citizen science solely uses scientists, volunteers, and lay people with interests 51 and knowledge about a specific topic 19 . Because tweets are generated via crowdsourcing and tend to 52 contain rumors and hoaxes, we assumed the tweets to be inaccurate and implemented a hierarchical 53 approach to verify the reliability and relevant of the tweets using scientifically derived and confirmed 54 data.

55
Despite the importance of social media in risk communication, there are challenges that need to 56 be addressed. First, information overload due to massive amounts of user-generated content can 57 overwhelm users in discerning relevant information 20 . Second, crowdsourced social media data often 58 lack metadata that provide information about the creator, time, date, device used to generate data, 59 purpose, and standard, making it less credible 21-23 . Third, robot-controlled social media accounts, 60 commercial spam, and collective attention spam/misinformation advent with social media 61 prevalence 24,25 , could also impede the quality of crowdsourced data. Finally, heuristic plays a 62 significant role in deciding what or whether to share information on social media. This has become 63 influential during complicated and unanticipated crisis situations, thereby contributing to the 64 possibility of introducing errors and biased judgements to shared risk information 26 . These 65 challenges are more pronounced in case of crowdsourced sites.

66
Even when the above issues are controlled, information relevance determines the usability of 67 social media crisis information. Thus, evaluating relevance of social media content is critical 10 , and 68 hence, it is paramount to assess the quality and trustworthiness of data to ensure the information 69 shared is accurate and true for decision making and public consumption during crisis. The goal of 70 this research is to extract risk information from tweets during the 2013 Colorado flood and assess the 71 reliability (accuracy and trueness) of this information. This was done by examining the text and image 72 content and comparing the content to publicly available information from federal, state and local 73 governments and emergency management agencies.

85
Data reliability can be defined as "the accuracy and completeness of data, given the uses they 86 are intended for" 33 . Existing research assessing reliability of crowdsourced data tends to focus on 87 evaluating quality of content (e.g., presence of metadata 21 , detection of rumors 24 ), and developing 88 machine learning algorithms or models to asses data reliability [34][35][36] . Citizen scientists, subject matter 89 experts are also used in reliability validation to differentiate and justify perceived "true incidents"

93
Despite the abundance of existing evaluation methods, some algorithm-based studies rarely 94 incorporate potentially relevant external data sources to the research context, such as meteorological 95 and geospatial data in flood studies 44 and digital elevation models (DEM) in earthquake or landslide studies 45 . As a result, these studies may fail to capture all the necessary information for reliability 97 validation. Therefore, this research designed a workflow to work closely with reference documents 98 to extract reliable risk information.

99
Reliability in this research refers to "accuracy of information and the extent to which the data 100 reflects actuality". Using this definition, a workflow was developed to assess reliability of extracted 101 risk information from relevant tweets that were obtained for the 2013 Colorado flood event. Using 102 the workflow, we examined the tweet text and images leveraging human intelligence and Google   The datasets used in this study include historical tweets, geospatial data sets corresponding to 117 the flood event and the study site (e.g., Boulder flood extent map, Boulder street map), and reference 118 documents including news articles and agency reports from the National Weather Services, state and 119 local government agencies. A discussion of the data processing steps and analytical approaches is 120 presented below.

162
relevance of these tweets was determined first following which their reliability was evaluated.

163
The text analysis involves a few consecutive steps. The first and foremost step is to search for 164 evidence information from reference documents, especially weather warning/alert messages from the

174
In the image analysis process, 308 images were downloaded from 720 relevant tweets. The

175
images were considered reliable if they met either of the following two conditions: (1) gain evidence 176 from credible sources, or (2) mutually prove each other. Both manual and automatic evaluation 177 approaches were implemented to analyze the 308 images. In the manual approach, images were 178 manually examined for damages to roads/streets, properties as well as their corresponding tweet text 179 content. Next, the image content, geographical locations, and text content were compared to reference 180 documents. In the automatic/Artificial Intelligence (AI) approach, Google Cloud Vision API 308 181 images were uploaded to Google Cloud's Vision API (application programming interface) 59 , which 182 were classified and assigned categorical labels using Google's pre-trained machine learning models.

183
This approach aims to leverage existing AI (artificial intelligence) tool to improve the efficiency of 184 extracting flood related features to facilitate the tweets reliability evaluation process.  Table 1 shows how the names of damaged roads/streets, tweet post time, and detailed 192 damage/impact were extracted manually. The location of the tweets shown in Table 1

199
Using the key phrase "west of Broadway", a related NOAA warning/alert message was found 200 from tweet #1 in Table 1: "Hourly rainfall intensity at the Sugarloaf RAWS station 6 mi. west of

201
Boulder compared with gage height on Boulder Creek at Boulder (west of Broadway). The first flood 202 peak closely followed the heavy rainfall before midnight on 9/11-12, when 3.5" fell in 6 hours. (Data: 203 rainfall: RAWS via WRCC; and streamflow: Colorado DWR; plotted by Jeff Lukas, WWA)".

204
The above message mentioned the gauge height on Boulder Creek at west of Broadway

205
following the flood peak that resulted from heavy rainfall before midnight on September 11 th . This 206 corresponds to tweet #1 and explains why "Boulder Creek is about to spill its bank at west of 207 Broadway" at 3:02 am on September 12 th . Therefore, tweet #1 in Table 1 was considered reliable in 208 terms of its location, time, and content.
When searching for "Broadway" and "Arapahoe Avenue", no direct evidence was found in tweet 210 #2, which may be because Arapahoe Avenue is a county road and is generally too specific to be 211 mentioned in official warning or damage assessment reports. However, as shown in symbol #2 in 212 Figure

216
The crossing of 8 th Street and Marine Street (symbol # 3 in Figure 2) was adjacent to and flooded 217 by Gregory Canyon Creek, which corresponded to tweet #3 (see Table 1) indicating that the drainage 218 at Gregory Canyon overflooded 8 th street. Based on the time, tweet #2 identified a rapid increase of 219 water level on Boulder Creek at 5:30am, and 20 minutes later, this tweet reported inundation of roads 220 due to flooding of Gregory Canyon Creek that is close to Boulder Creek. This confirmed that risk 221 information in tweet #3 is reliable based on content, time, and geographical locations.

222
The intersection of 28 th Street and Colorado Avenue (symbol #4 in Figure 2) is between Boulder

223
Creek and Skunk Creek, and tweet #4 was posted at the peak of the flooding when water overflowed

229
Tweet #5 in Table 1 was posted in a similar context as tweet #4, and the user seemed to have 230 witnessed the flooded neighborhood streets. Since this tweet was reliable, 15 th street (where the tweet 231 was posted) could be marked as inundated so that others can avoid this road.

232
State Highway 36 was mentioned several times in tweets #6, #10, #11, and #12 (Table 1). The 233 earliest mention was on September 12 th when excessive rainfall continued to intensify the flooding 234 situation. Those tweets also disclosed other details about Highway 36, such as "raining and pouring",

235
"flooded by over 3 feet of water", and "its subsequent closure". Evidence of this was also found in

242
Tweets #6, #10, and #12 were also geo-located along Highway 36, but tweet #11 was posted 243 beyond the city limits of Boulder. Because this tweet was posted from a place that is farther from the 244 impacted location, it was hard to prove its reliability without referring to other tweets that also 245 mentioned Highway 36. However, because the content mentioned in this tweet was also mentioned 246 in other tweets, this tweet was considered reliable. Consequently, keywords that were verified to be 247 related to important incidents/places, such as Highway 36, could be used to extract tweets that were 248 beyond the spatial limit of the study area or even do not possess any geo-location information. This

258
Tweets #8 and #9 were geo-located along the flooded Skunk Creek (symbols #8 and #9 in Figure   259 2). While 30th Street was flooded, the adjacent Colorado Avenue was already closed. Both streets are in the Foothills area, which was reported to have been seriously impacted by flood in a damage 261 assessment report summary: "Foothills around Boulder also saw severe flooding and debris flows" 54 .

Evaluation of image content 263
Interactive delivery of information (stories) through images is more engaging because it is an 264 effective way to visualize information that enables brain to process and organize the information.

285
The images shown in Figure 5 were taken at the same location by different people, at different

327
We believe that doing this would yield a larger volume of relevant tweets that were discarded due 328 to lack of geoinformation. Without geolocation, it is possible that those tweets may be sent from 329 outside the study area, but the time frame (September 9th to 18th, 2013) and keywords (a. location

332
The keywords we used were from

336
This is a big improvement than using geo-tagged tweets alone for this research workflow. including both would extract more information than only using only text or images.

350
The strengths of this research are: 1) precipitation data was used to account for the cause of flood, based on their proximity to events in the surrounding areas by pinpointing them on maps), which is 359 impossible for current AI approaches to achieve. Given that the current neural networks (E.g.,

360
ResNet, UNet) used for disaster situations require human intelligence to collect and label significant 361 amount of training images, our manual approach complemented the AI approach. The GCV API 362 could be replaced with other AI algorithms. However, our research workflow can be repurposed to 363 be used by researchers interested in designing automatic or semi-automatic systems to extract reliable 364 and relevant data and information from social media streams searching for disaster response.

373
(https://www.usgs.gov/topic/citizen-science) and FEMA's crowdsourcing and citizen science efforts 374 have allowed citizens to participate during emergency management and response efforts to 375 complement the activities underway by the decision-makers. With the involvement of these digital 376 humanitarians, we believe that the workflow outlined in our research can partially or fully be 377 adopted in disaster responses. Further, the AI approach was able to detect reliable information for 378 11% of the images, which is less than the percentage achieved in manual approach (19%) and most 379 of the images identified by AI approach were also identified by manual approach. AI has low 380 accuracy because it was developed for general purpose image detection and understanding, but not 381 tailored for flood / disaster learning. If more images are used to train the AI model, it has the potential 382 to significantly improve the accuracy. This approach requires less human labor investment; so, it is 383 complementary to the manual approach and is advantageous when significant number of tweets are 384 available. Finally, human errors and heuristic bias may be introduced in manual approaches, even 385 though multiple authors cross-checked the results.

386
Considering the limitation of this research workflow, future research would focus on 387 streamlining the process and automating the entire workflow of assessing relevance and reliability 388 of Twitter data. Moreover, integration of citizen-led reliability evaluation efforts following well-