Multi-Dimensional Urban Flooding Impact Assessment Leveraging Social Media Data: A Case Study of the 2020 Guangzhou Rainstorm

Lu, Shuang; Huang, Jianyun; Wu, Jing

doi:10.3390/w15244296

Open AccessArticle

Multi-Dimensional Urban Flooding Impact Assessment Leveraging Social Media Data: A Case Study of the 2020 Guangzhou Rainstorm

by

Shuang Lu

¹,

Jianyun Huang

^1,* and

Jing Wu

²

¹

Design School, Shanghai Jiao Tong University, Shanghai 200240, China

²

School of Urban Design, Wuhan University, Wuhan 430072, China

^*

Author to whom correspondence should be addressed.

Water 2023, 15(24), 4296; https://doi.org/10.3390/w15244296

Submission received: 17 November 2023 / Revised: 11 December 2023 / Accepted: 12 December 2023 / Published: 17 December 2023

Download

Browse Figures

Versions Notes

Abstract

:

In the contexts of global climate change and the urbanization process, urban flooding poses significant challenges worldwide, necessitating effective rapid assessments to understand its impacts on various aspects of urban systems. This can be achieved through the collection and analysis of big data sources such as social media data. However, existing literature remains limited in terms of conducting a comprehensive disaster impact assessment leveraging social media data. This study employs mixed-methods research, a synergy of statistical analysis, machine learning algorithms, and geographical analysis to examine the impacts of urban flooding using the case of the 2020 Guangzhou rainstorm event. The result show that: (1) analyzing social media content enables monitoring of the development of disaster situations, with varied distributions of impact categories observed across different phases of the urban flood event; (2) a lexicon-based approach allows for tracking specific sentiment categories, revealing differential contributions to negative sentiments from various impact topics; (3) location information derived from social media texts can unveil the geographic distribution of impacted areas, and significant correlations are indicated between the waterlogging hotspots and four predisposing factors, namely precipitation, proportion of built-up surfaces, population density, and road density. Consequently, this study suggests that collecting and analyzing social media data is a reliable and feasible way of conducting rapid impact assessment for disasters.

Keywords:

social media; disaster assessment; spatiotemporal analysis; sentiment analysis; urban flooding

1. Introduction

In recent decades, urban flooding has emerged as a pervasive and economically burdensome natural disaster on a global scale, exerting severe adverse impacts on the environmental, social, and economic sustainability of urban areas [1,2]. Urban floods not only result in direct tangible damage to residential, economic, and industrial property but also cause significant disruptions to urban services and functions [3,4]. Given the context of global climate change, it is anticipated that both the frequency and intensity of extreme rainfall events will increase. Additionally, the projected urban population is expected to reach 6.7 billion by 2050, with 68% residing in urban areas, indicating a continuously expanding population and assets at risk of urban flooding [3,5,6]. Consequently, the threat of urban flooding is expected to intensify, necessitating the development of more effective and adaptive flood management strategies and measures. The concept of flood resilience has garnered increasing attention from scholars and practitioners. Among the various existing definitions of resilience, there is a consensus that flood resilience emphasizes the capacity to mitigate the consequences of flooding [4,7,8,9]. Therefore, conducting rapid disaster impact assessments is a fundamental requirement for effective disaster management and plays a significant role in enhancing urban resilience and reducing vulnerability to natural disasters [10,11,12,13,14,15]. However, traditional data sources, such as official reports, remote sensing, and approaches like field surveys, often suffer from delayed responses to actual disaster events, incomplete information gathering, and high costs. These limitations hinder their ability to provide timely disaster impact assessments [16,17,18].

In the current era of big data, the utilization of user-generated data, particularly social media data, has gained increasing popularity. Advancements in web crawler technology have enabled efficient and timely access to information posted on social media platforms. Moreover, social media platforms serve as a valuable source of public voices and participation, complementing traditional data sources by providing insights into disaster situations and public behavior [19,20]. Due to these merits, social media holds promising application prospects as an alternative database for government decision-makers and scholars to gain a better understanding of and address disasters [21,22,23]. The utilization of social media on disaster management is multifaceted, encompassing early warning systems, information dissemination, and facilitating response and management efforts [24,25,26]. However, one of the most prominent benefits offered by social media is the improvement of situational awareness. In other words, it enables the extraction of diverse human perceptions surrounding an event, such as a disaster, to aid in interpreting situations and making informed decisions [27,28]. Social media users can share real-time information about their experiences and emotions at the location where the incident occurred [29,30]. Given the characteristics of social media, the interpretation of social media data for disaster impact assessment has garnered significant attention in academia.

Previous studies have demonstrated the significant utility of social media activities to evaluate the impact of disasters. Kryvasheyeu et al. [25] demonstrated a strong correlation between per-capita Twitter activities related to Hurricane Sandy and the resulting per-capita economic disaster damage. Guan and Chen [31] suggested that activities on social media serve as reflections of human life patterns or rhythms that are closely intertwined with the temporal and spatial process of disasters. Meanwhile, many scholars have employed the textual content of social media data to examine the impact of disasters. These studies primarily focused on evaluating the impact of disasters through two dimensions: quantitatively assessing the severity of damage and comprehensively examining the classification of impact. Quantitative analysis of disaster impact or damage often involves the use of social media ratios, such as DIRR (Disaster-Related Ratio) and DARR (Damage-Related Ratio), as well as sentiment analysis [18,25,31,32,33,34]. Studies focusing on detailed impact classification may employ keyword-based approaches. For instance, Wu et al. [35] summarized the impacts of Typhoon Lekima as affected people, affected agriculture, and collapsed houses based on keyword extraction. Other studies have further divided disaster damage into physical damage and emotional damage, and they have proposed frameworks to quantitatively assess the damage in each category through the construction of lexical dictionaries [11,14]. However, current studies still have certain gaps. The temporal variation of specific impact aspects is rarely explored, especially for the impacts on human daily activities. In addition, although previous studies have discussed the correlation between public sentiment and disaster intensity or damage [25,32,36], there is a lack of further investigation of the correlation between public sentiment and specific aspects of disaster impact.

Deriving valuable information from abundant and noisy primary data to identify the impacted area in a disaster is another challenge. Spatial reference information on social media can be detected through spatial coordinate tags and users’ profiles [37]. The spatial pattern of social media activities can also provide insight into the proximity of disasters [18,25,34,38]. Meanwhile, several studies have utilized social media data as a standalone source or integrated them with other data sources, such as remote sensing and topographic data, to investigate the mapping of disaster extents and impacted areas [34,39,40,41,42,43]. However, relevant studies remain limited. When investigating the spatial characteristics of social media activities, it is important to note that the locations voluntarily tagged by users may represent biased places rather than the actual incident locations [13,44]. In addition, most existing studies have focused on large-scale disasters (e.g., hurricanes and basin floods), and the severity of the impact has primarily been analyzed at administrative scales, such as provincial [35,45], city [31], or county [34,46] levels. However, little attention has been given to the fine-grained impact mapping of small-scale disasters, such as urban floods, which require more localized and scattered location information [47].

This study aims to propose a novel attempt to explore how disaster impact can be perceived through social media texts, using a case study of the 2020 Guangzhou rainstorm event. Firstly, the impact categories were classified and quantified applying a multi-label classification based on the synergy of keyword frequency analysis and machine learning algorithms. Secondly, the public sentiments responding to the urban flooding were analyzed employing a lexicon-based approach. Thirdly, the impacted areas (waterlogging spots in this case) during the rainstorm event were identified through the location information extracted from social media texts. Moreover, this study further addresses three fundamental inquiries: (1) how the disaster impacts evolve over time; (2) what is the correlation between public sentiments and the disaster impacts; (3) what are the spatial distribution characteristics of the impacted areas identified based on social media data.

The subsequent sections of this paper are structured as follows: Section 2 and Section 3 provide an introduction to the case study and proposed framework in this research, encompassing data collection and data analysis, respectively. Subsequently, Section 4 presents the empirical results of the Guangzhou rainstorm event investigation. Section 5 discusses the key findings of this study along with their potential implications for practical application. Finally, Section 6 provides a concise conclusion, as well as addresses the limitations and offers suggestions for future research.

2. Case Introduction

The selection of the case study was based on three primary criteria. Firstly, the case city had to be a mega city with a large population, ensuring frequent and intensive discussions of a rainstorm event on social media within the Chinese context. Secondly, the case city had to have confronted at least one disruptive rainstorm event in the past five years. Lastly, there needed to be sufficient meteorological and social media data from the disruptive rainstorm event. Consequently, Guangzhou, with a population of 15.31 million (by 2019) and an extreme rainstorm on 22 May 2020, was selected for this study.

Guangzhou is in the south of mainland China, adjoining the estuary of the lower reaches of the Pearl River Basin, covering 22°26′~23°56′ N and 112°57′~114°3′ E. Guangzhou is one of the four major central cities in the Guangdong–Hong Kong–Macao Greater Bay Area, the capital city of Guangdong Province, one of the nine state-recognized national central cities, as well as an international transportation hub. Guangzhou consists of 11 districts (Yuexiu, Haizhu, Liwan, Tianhe, Baiyun, Huangpu, Huadu, Panyu, Nansha, Conghua, and Zengcheng) with a total area of 7434.4

{k m}^{2}

. Guangzhou belongs to the marine subtropical monsoon climate zone, and the city’s average annual precipitation ranges from 1673.0 mm to 2004.6 mm. In addition, the precipitation is distributed unevenly across the year, of which approximately 80% is mainly concentrated from April to September, and the average number of precipitation days per year is around 150. Guangzhou has been experiencing rapid urbanization since the implementation of the “Reform and Opening-up” policies around the 1980s. Extensive urban expansion has disrupted original hydrological cycles, resulting in an increased risk of urban flooding. Since the start of the 21st century, Guangzhou has suffered several major urban flood disasters with significant loss of life and property, one of which was the 2020 rainstorm event selected for this research.

On 22 May 2020, Guangzhou encountered an extraordinary rainstorm characterized by its substantial intensity and extensive coverage. Throughout the duration of the rainstorm, the average precipitation in the city reached 101 mm, with the highest recorded values observed in Huangpu (176.2 mm), Zengcheng (155.4 mm), and Conghua (114.4 mm). Notably, the Yonghe Street meteorological station in the Huangpu district documented an exceptionally intense accumulated precipitation of 378.6 mm, marking a historical peak within the past century. Furthermore, 42 meteorological stations reported unprecedented hourly precipitation levels. The consequences of this rainstorm event were severe. The rainstorm resulted in widespread urban flooding, leading to the closure of numerous roads and tunnels, disruptions to the metro system, the loss of four lives, and substantial property damage.

3. Data and Methods

The framework of multi-dimensional urban flooding impact assessment is depicted in Figure 1, and the specific steps of this study are elaborated below:

3.1. Data Collection and Pre-Processing

The data collection process involved the retrieval of social media data from Sina Weibo, a prominent microblogging platform widely used in China, analogous to Twitter. Sina Weibo possesses a substantial user base, with hundreds of millions of individuals actively engaging on the platform each month. Users on Weibo have the ability to share their personal experiences, emotions, and opinions within the confines of 140 characters, and these messages can be disseminated further through reposting. Weibo accounts are categorized into two distinct types: official accounts and personal accounts. Official accounts pertain to institutional users officially recognized by Weibo, including government agencies, mainstream media outlets, companies, and various organizations. Meanwhile, personal accounts are associated with individual users, encompassing celebrities and the general public.

Extensive relevant texts were posted before, during, and after the 2020 Guangzhou rainstorm. The data acquisition process in this study employed a keyword-based search methodology. Python package-based web crawler technology was utilized to retrieve texts associated with the rainstorm event, utilizing key phrases such as ‘Guangzhou rainstorm’, ‘Guangzhou flood’, and ‘Guangzhou waterlogging’. Given the objective of this study to acquire data pertaining to the impact of the urban flood disaster, which was relatively scarce before the rainstorm occurred, the crawling timeframe was set from the onset of the rainstorm to seven days after the event, encompassing a total duration of eight days from 22 May 2020 to 29 May 2020. For each text, the user’s nickname, the account ID, the text content, time of posting, and location of posting (only available when users voluntarily tagged their locations) were obtained. After manually filtering irrelevant and duplicated information, a total of 7707 original microblog texts were retained for further investigation. Furthermore, differing from English, there is no word boundary in Chinese sentences, necessitating word segmentation in Chinese natural language processing. In this study, word segmentation was processed using the segmentation module named Jieba, an open-source program specifically designed for Chinese word segmentation.

3.2. Framework of Multi-Dimensional Urban Flooding Impact Assessment

3.2.1. Step 1: Impact Topic Extraction and Quantification

(1): Urban flooding impact topic classification

The impact assessment commenced by categorizing impact topics, which was based on constructing an impact-related keyword lexicon. Specifically, word frequency analysis was initially implemented on the collected social media data, and words with a frequency exceeding 10 were filtered out. Subsequently, the keywords regarding the disaster’s impacts were manually extracted to establish the impact-related keyword set, which could be organized into five categories, namely: ‘traffic’, ‘life & property loss’, ‘work’, ‘education’, and ‘infrastructure failure’.

(2): Impact Evaluation and Temporal Evolution

Building upon the theory systematically employed in previous studies, which posits a significant positive correlation between the extent or severity of disaster impacts and the volume of social media microblogs reporting the impacts [11,14,20,24,48,49], this study evaluated the impacts of urban flooding by quantifying the number of impact-related texts. To identify the impact-related texts, the microblogs containing the aforementioned impact-related keywords were first selected as potential candidates of impact-related texts. Given that impact-related keywords may be present in microblogs that do not actually depict the impact status [13], it was necessary to conduct further screening of impact-related texts. Notably, as one microblog text may include the description of multiple impact topics, the discrimination of impact-related texts is a multi-label classification task [50]. In this study, for each aforementioned impact topic, 1500 textual samples were randomly selected and manually labeled. The labeling was based on whether the text contained a description or a comment about the particular impact topic. A binary labeling scheme was applied, where 0 denoted the text as unrelated to the impact topic and 1 denoted it as related. Subsequently, a machine learning algorithm was employed to automatically classify the textual data. A comparison was conducted between two text-classification algorithms, namely Support Vector Machine (SVM) and Naïve Bayes (NB). Based on the results presented in Table 1, it was observed that SVM outperformed NB with a higher accuracy in classifying each impact topic. Therefore, SVM was chosen to identify the impact-related texts in this study. Each microblog text underwent binary classification five times to determine its relevance to each impact topic. If the text was assigned one or multiple impact topic labels, it was considered an impact-related text; otherwise, it was classified as unrelated. Finally, the classification results were subjected to a manual review for further screening to ensure the accuracy of the final classification results.

Furthermore, after identifying the impact-related texts, temporal evolutions of various impact topics were conducted on two distinct time scales: the variation within the rainstorm day and daily variation during the entire study period.

3.2.2. Step 2: Analysis of Public Sentiment Responding to the Disaster

Two primary approaches, namely the sentiment lexicon-based method and the machine learning method, are widely adopted to investigate the public sentiment responding to disasters [24,51,52,53]. This study utilized the lexicon-based method, as the sentiment lexicon is well-suited for low granularity texts characterized by shorter lengths and sentences (e.g., social media texts), and it offers the advantages of efficient procedures and high accuracy [54]. However, the quality of lexicon-based sentiment analysis results is contingent upon the efficacy of the sentiment dictionaries used. The emotion ontology developed by Dalian University of Technology (DLUT Emotion Ontology) was employed in the present study. This sentiment dictionary is the most widely used for Chinese sentiment analysis and encompasses a comprehensive collection of over 30,000 emotion-related terms [55,56]. DLUT Emotion Ontology categorizes the terms into seven distinct emotions: good, joy, surprise, disgust, sadness, fear, and anger. Each term is also assigned a specific level of emotional intensity to evaluate its sentiment strengths. In this study, the sentiment score of each microblog text was calculated by matching the sentences to the DLUT Emotion Ontology and extracting the emotion-related terms and relevant degree adverbs. For each text, the sentiment score of each aforementioned emotion category was determined based on the following equation:

{S S}_{k} = \sum_{i}^{n} (\sum_{j}^{m} x_{j}) w_{i}

where

{S S}_{k}

represents the sentiment score of emotion category k;

w_{i}

represents the normalized sentiment intensity of emotion-related term i;

x_{j}

represents the weight of degree adverb j.

3.2.3. Step 3: Analysis of Spatial Distribution Patterns of Impacted Areas

The aim of spatial analysis was to investigate the spatial characteristics of the impacted area affected by the rainstorm event by utilizing location information extracted from social media texts, encompassing three sequential components: identification of waterlogging spots, analysis of spatial distribution patterns, and correlation analysis.

(1): Identification of waterlogging spots

To identify the impacted areas, the locations where waterlogging occurred were detected directly from social media text employing a Name Entity Recognition (NER) tool based on a Python package named pyhanlp. Specifically, a pre-defined location lexicon was constructed by identifying the mentioned locations in Guangzhou from social media, including the administrative subdistricts (e.g., towns and urban villages), traffic-related areas (e.g., urban roads/streets and public transport stations), residential areas (e.g., dwellings and gated neighborhoods) and public areas (e.g., education institutions, squares, and shopping centers). Subsequently, to ensure the credibility of the waterlogging spots, a manual review was conducted on all the texts containing specific locations to identify the valid waterlogging spots and remove ambiguous information. Specifically, the waterlogging occurrence reported from the official accounts of government agencies (e.g., “Guangzhou traffic”, “Guangzhou firefighting”, etc.) and authoritative media (e.g., “Sina Guangdong”, “Guangzhou Daily”, etc.) were directly accepted as waterlogging spots. For the locations mentioned in personal accounts, only the texts explicitly expressing the occurrence of waterlogging with detailed descriptions were admitted. In addition, after collecting all the locations of waterlogging occurrences, the API of Gaode Map (a widely used digital map app) was invoked to convert the location names into longitude and latitude coordinates for impacted area mapping and further investigation.

(2): Spatial distribution pattern of waterlogging occurrences

Based on the locations where waterlogging occurred, the spatial distribution pattern of impacted areas was explored utilizing spatial autocorrelation analysis at an analysis scale of 2 km. The analysis of spatial autocorrelation can be classified into two categories: global spatial correlation and local spatial correlation. Utilizing ArcGIS software 10.8, this study used the global Moran’s index to determine the existence of spatial agglomeration, which can be represented as the following equation:

M o r a n ’ s I = n \times \frac{\sum \sum w_{i j} (x_{i} - \bar{x}) (x_{j} - \bar{x})}{\sum \sum w_{i j} \sum {(x_{i} - \bar{x})}^{2}}

where

w_{i j}

is the spatial weight matrix;

x_{i}

is the observation in spatial unit i;

\bar{x}

is the mean value of observation across all spatial units.

Meanwhile, the local Moran’s index was employed to explore the detailed spatial agglomeration pattern of waterlogging occurrences during the rainstorm event.

(3): Correlation analysis

This study conducted statistical analysis to explore the correlation between the impacted areas identified from social media and real-world characteristics. Nine factors, which have been commonly involved in previous studies, were introduced. These factors can be categorized into four groups: natural hazards (daily precipitation), topography features (elevation, slope, and curvature), LULC conditions (Normalized Difference Vegetation Index (NDVI), proportion of built-up surfaces, and proportion of water surface), and anthropogenic factors (population density and road density). Table 2 shows detailed information about the data involved in this study.

To ensure the independence of all the factors, it was essential to diagnose the presence of multi-collinearity among the above factors. The variance inflation factor (VIF) and its reciprocal (tolerance) were used to conduct a multi-collinearity assessment. As shown in Table 3, all nine predisposing factors had VIF values below 5, while those of tolerance were higher than 0.2. Therefore, no multi-collinearity was observed among these factors.

The correlation between waterlogging occurrences, identified through microblog texts, and the aforementioned nine predisposing factors was determined using a binary logistic regression model. This model is suitable when the dependent variable is binary and the independent variables are either categorical or numerical. In this study, the natural hazard was assessed in terms of presence or absence of waterlogging occurrence (1 or 0) in a corresponding spatial unit, which served as a binary dependent variable.

4. Results

4.1. Impact Assessment

4.1.1. Impact Topic Classification and Quantification

The keywords related to disaster impacts were categorized into five topics, namely ‘life & property loss’, ‘traffic’, ‘education’, ‘work’, and ‘infrastructure failure’. For revealing detailed descriptions of the impacts, Table 4 summarizes the top five keywords of each topic. Under the topic of ‘life & property loss’, the top keywords indicated that not only were residents’ lives endangered, but also significant damage was inflicted upon their property including damaged buildings and flooded vehicles. Regarding the impacts on ‘traffic’, the travel activities of residents were disrupted from two aspects, namely public transportation systems and private travel. Additionally, the activities related to ‘work’ were significantly hindered as lots of people failed to get to work. Under the topic of ‘education’, the most significant impact was class suspension in partial districts. In addition, the education activities in universities (e.g., exams) were also impacted. The impact related to ‘infrastructure failure’ was also inferred through social media, namely that the rainstorm caused disruptions in network signals, electricity, and water supply.

The impact-related texts pertaining to each aforementioned impact topic were identified through multi-label topic classification. Table 3 shows the distribution of the impact topics across the whole period, on the rainstorm day, and during the post-rainstorm stage, respectively. In general, ‘traffic’ was the most frequently mentioned topic with the text number of 1807. ‘Life & property loss’ and ‘Work’ were in second and third place (1527 and 691, respectively), while the remaining two topics had relatively low text numbers, which were lower than 300.

Table 5 also exhibits the distribution of various impact topics on the rainstorm day and during the post-rainstorm stage, respectively. Specifically, ‘traffic’ (1117 texts) ranked the top topic on the rainstorm day, followed by ‘life & property loss’ and ‘work’ (588 and 521 text, respectively). In addition, 197 texts were related to ‘education’, while ‘infrastructure failure’ was least identified. After the rainstorm subsided, ‘life & property loss’ became the most significant topic with the text number of 939. ‘Traffic’ and ‘work’ were in second and third places, whereas ‘education’ and ‘infrastructure failure’ became insignificant during the post-rainstorm stage.

4.1.2. Temporal Analysis of Impact Topics

The diurnal variations of impact-related texts representing the five topics throughout the entire period are illustrated in Figure 2 within a 6-h interval, indicating disparities of persisting durations among different impact topics. Specifically, the majority of topics were only active on the rainstorm day, including ‘work’, ‘education’, and ‘infrastructure failure’. This implies that the impacts on these aspects were promptly resolved after the rainstorm subsided. However, the topic of ‘traffic’ remained significant until 25 May, suggesting that the travel disruptions caused by the rainstorm were likely to persist for several days. By reviewing the relevant texts, it was determined that the primary concern regarding traffic pertained to the resumption of operation for the subway line that was forced to shut down due to the rainstorm. Meanwhile, the topic of ‘life & property loss’ exhibited a longer life span. It is noteworthy that between 25 May and 26, there was a significant resurgence in texts pertaining to the topic of ‘life & property loss’, primarily revolving around an economic dispute between vehicle owners and property management companies regarding compensation for damage incurred by flooded garages.

Figure 3 depicts the temporal fluctuations in the number of texts related to the five impact topics throughout the rainstorm day (Friday, 22 May 2020). The fluctuation in the volume of impact-related texts closely followed a typical weekday pattern, with midnight to 6:00, 6:00 to 18:00, and 18:00 to midnight corresponding to sleeping hours, working hours, and after-work leisure time, respectively. In particular, the number of texts related to all five topics remained low during the early morning hours from midnight to 6:00. During the working hours, the text volumes of most topics exhibited varying degrees of surges. The topics of ‘traffic’, ‘work’, ‘infrastructure failure’, and ‘life & property loss’ reached notable peaks during the morning rush hours (8:00–10:00), whereas the highest frequency of ‘education’ occurred between 12:00 and 14:00. However, the frequency of texts related to ‘infrastructure failure’ consistently remained low with minor fluctuations. During the leisure hours starting from 16:00–18:00, the text volumes pertaining to all five topics became relatively stable. Interestingly, a resurgence of ‘traffic’ was observed between 18:00 and 20:00, possibly due to the rising demand for transportation during the evening rush hours.

4.2. Sentiment Analysis

4.2.1. Public Sentiment Responding to the Disaster

The public sentiment expressed through the social media platform during and after the rainstorm event was evaluated using the sentiment lexicon. Figure 4 presents the temporal variations of the scores for seven distinct sentiment categories. On the rainstorm day, the predominant sentiment was the positive sentiment of good, accounting for 28.68% of all sentiments expressed. This was followed by two specific types of negative sentiment, namely fear and disgust, which accounted for 22.07% and 21.27%, respectively (Table 6). Meanwhile, joy emerged as the second most important positive sentiment, accounting for 13.63% of all sentiments. During the post-rainstorm stage, good maintained its top position with a further increasing proportion of 37.29%. Joy also exhibited a significant increase in proportion to 21.98%, ranking second out of all sentiments. Disgust was the most prominent negative sentiment after the rainstorm subsided, of which the proportion showed a slight decline to 18.00%. Notably, fear showed the most significant difference with a substantial reduction in proportion to 9.75%. Sadness was the only negative sentiment exhibiting an increasing trend from 8.72% during the rainstorm day to 10.80% during the post-rainstorm stage. Furthermore, surprise was the least frequently expressed positive sentiment throughout the whole period, while anger consistently remained at an extremely insignificant level.

By reviewing the social media data, it was found that the positive sentiment of good during the rainstorm mainly included encouragements to overcome the difficulty, wishes for peace, as well as trust in the government. Additionally, joy was mainly attributed to individuals who were not affected by the rainstorm event and experienced happiness and a sense of security. However, negative sentiments mainly consisted of disgust related to the anxiety for the damage, questions about the urban drainage system, and fear evoked by the horrifying scenes witnessed during the rainstorm. Among the sentiments during the post-rainstorm stage, the high proportions of good and joy were mainly attributed to the substantial blessings for the affected people and their rescuers, particularly on 23 May when people were just hearing about the situations of the disaster. Meanwhile, individuals’ moods were also brightened by the beautiful scenes after the rainstorm. In terms of the negative sentiments during the post-rainstorm stage, the disgust and sadness expressed by individuals were mainly linked to their incurred property losses and the disruption of subway operation.

4.2.2. The Relationship between Disaster Impacts and Social Media Sentiment

To investigate the relationship between public sentiment and disaster impact, a t-test was employed to examine any significant difference in sentiment between individuals who experienced different aspects of the impacts during the disaster and those unaffected by the disaster. Table 7 presents a comparison between the average sentiment scores of texts related to each specific impact topic with those unrelated to disaster impacts (unrelated texts). The results suggest that people who were affected during the disaster tended to express fewer positive sentiments compared to those unaffected. Specifically, significantly lower scores of good were observed in texts related to the impact topics of ‘work’, ‘education’, and ‘infrastructure failure’ compared to the unrelated texts. Meanwhile, texts pertaining to ‘life & property loss’, ‘traffic’, and ‘infrastructure failure’ exhibited significantly diminished joy scores compared to the unrelated texts.

However, regarding the negative sentiments, texts related to ‘life & property loss’ exhibited significantly higher scores of disgust and sadness compared to unrelated texts. Additionally, texts related to ‘work’ also demonstrated an elevated score of disgust. However, texts pertaining to all five impact topics had a significantly smaller score for fear when compared to unrelated texts.

4.3. Spatial Analysis

4.3.1. Identification of Waterlogging Spots

Instead of relying on user-generated location tags, specific place terms were directly extracted from the text contents to identify impacted locations. The authentic waterlogging spots were then manually filtered by screening microblog texts, resulting in the identification of 223 waterlogging spots. The distribution of waterlogging spots and the number interval of identified spots in each district are illustrated in Figure 5a. Table 8 presents the numerical values and proportions of waterlogging spots across all 11 districts in Guangzhou. Huangpu had the largest number of waterlogging spots (81 spots, accounting for 35.5% of all the waterlogging spots), followed by Tianhe (46 spots, 21.5%) and Zengcheng (41 spots, 17.8%). These three districts made up a dominant proportion (74.8%) of all the waterlogging spots.

Based on the location descriptions extracted from social media data, the waterlogging spots could be categorized into three types: traffic-related area spots, residential area spots, and public area spots (Figure 5a). The number of traffic-related area spots (i.e., urban roads, public transport stations) reached 133, accounting for 59.6% of all the waterlogging spots. Meanwhile, the numbers of residential area spots and public area spots were 47 and 43, accounting for 21.1% and 19.3%, respectively. Partial main waterlogging spots are presented in Table 9. Traffic-related area spots consisted of public transportation stations and arterial and collector roads, as well as other urban roads and tunnels. Residential area spots primarily comprised gated neighborhoods and urban villages. Public area spots mainly encompassed commercial areas, industrial areas, education and research institutions, etc.

4.3.2. Spatial Pattern of Impacted Areas

For the spatial agglomeration pattern of waterlogging spots, Table 10 shows that the global Moran’s I was positive, illustrating a spatial agglomeration effect in the areas impacted by the rainstorm event. Meanwhile, the z-values and p-values indicate that the spatial autocorrelation of waterlogging areas was significant. The spatial distribution of clusters of impacted areas during the rainstorm event was also depicted. As shown in Figure 5b, the high–high cluster represents a positive spatial autocorrelation, indicating high value areas are surrounding by high value neighboring areas. A high–low outlier occurs when areas with high values are surrounded by neighboring areas with low values, referring to negative spatial autocorrelation. A low–high outlier, also known as a negative spatial autocorrelation, occurs when low-value areas are surrounded by high-value neighboring areas. The results reveal that the most severely impacted areas (high–high cluster) were mainly concentrated in the geographically central region of Guangzhou. Specifically, southeastern Tianhe, southern Huangpu, and southwest Zengcheng exhibited the densest high–high agglomeration grids, suggesting that these regions were severely affected. Meanwhile, the high–high agglomeration grids were also observed in certain regions of the historic central urban districts including northern Haizhu, southern Yuexiu, eastern Liwan, and southern Baiyun. In addition, the presence of high–low agglomeration grids was sparsely distributed in the peripheral areas of high–high agglomeration grids as well as the suburban areas of Guangzhou, indicating that these regions were also impacted by the waterlogging resulting from the rainstorm event.

4.3.3. Influencing Factors on the Impacted Areas from Social Media

To explore the influencing factors contributing to the identification of impacted areas through social media, binary logistic regression analysis was employed to investigate the relationship between waterlogging occurrences and predisposing factors including natural hazards, topographic factors, land surface conditions, and anthropogenic factors. Table 11 reveals that three factors, namely precipitation, population density, and proportion of built-up surfaces exhibited a significant influence on waterlogging occurrences at the level of 0.01. Meanwhile, road density was significant at the level of 0.05.

The logistic coefficient for precipitation (5.873) exhibited the highest value, indicating a significant correlation between natural hazard intensity and affected areas. In terms of land surface characteristics, the proportion of built-up areas emerged as the second most influential factor with a logistic coefficient of 4.92, demonstrating its substantial influence on waterlogging occurrences. Furthermore, population density and road density, representing anthropogenic factors reflecting urbanization levels, were also found to be significantly associated with waterlogging incidents. However, elevation, slope, NDVI, and proportion of water surface showed no significant influence on waterlogging occurrences.

5. Discussion

In previous studies, efforts have been made to evidence the potential utility of social media data in augmenting situation awareness and facilitating rapid damage assessment during disasters [13,45]. Building upon this existing knowledge, the present study proposes a framework for harnessing social media data to multi-dimensionally assess the impacts of an urban flood disaster, using the 2020 Guangzhou rainstorm event as a case study. The proposed framework effectively utilized the textual content of social media data to ascertain public perception regarding the affected aspects during the disaster, as well as to identify the impacted areas. The key findings and their practical implications derived from this study are elaborated below.

5.1. Key Findings and Reflections

The method proposed in this study validates the application of social media data for rapid assessment of disaster impacts and tracking their temporal evolution. Specifically, previous applications have utilized social media data to analyze disaster impacts by assessing the intensity of social media activities or extracting semantic information from social media texts. In order to investigate specific aspects of impact during a disaster, rather than solely quantifying its impact using a metric, this study employed semantic analysis to evaluate the disaster impacts. In line with previous studies [11,14], this work corroborated that the textual content of social media data contained information on the disaster impacts. Based on a synergy of word frequency analysis and machine learning algorithms, the impact severity of each topic was implied using the number of impact-related texts, indicating that ‘traffic’ and ‘life & property loss’ were the most influenced topics. Meanwhile, compared to previous relevant studies [35,48], this study highlighted the distribution disparities of impact topics between the rainstorm day and the post rainstorm stage. Specifically, ‘traffic’ was the most dominant topic on the rainstorm day, and the impact topics geared to people’s daily life activities such as ‘work’ and ‘education’ were also significant. Then, the majority of public attention was concentrated on the impact on ‘life & property loss’ during the post-rainstorm stage. From the perspective of temporal analysis, this study incorporated two distinct time scales to examine the evolution of disaster impacts. By monitoring the diurnal variations of impact-related texts during and after the rainstorm event, the persisting durations and recovery situations of various impact topics could be analyzed [14]. Regarding the intra-day hourly variation on the rainstorm day, this study affirmed the findings presented by previous studies [11,18] that the disaster reflection generated by social media users closely aligns with people’s typical daily routine.

This study also incorporated sentimental information extracted from social media. This study expanded previous research that primarily focused on sentiment polarity (positive/negative) [14,36] by employing a sentiment analysis approach capable of exploring a more nuanced sentiment classification. Specifically, it delved into specific sentiment categories based on the seven sentiment types determined by the DLUT-Emotion ontology. The findings show that good was consistently the most prevalent sentiment throughout the entire study period. This trend was observed even on the rainstorm day despite the temporary surpassing of negative sentiment over positive sentiment. However, fear and disgust were the top two negative sentiment types on the rainstorm day, while fear and sadness were prevalent during the post-rainstorm stage. Furthermore, this study also determined the influence of disaster impacts on public sentiment by performing the sentiment difference between texts pertaining to specific impact topics and unrelated texts. Another study indicated that individuals are more inclined to express negative sentiments when describing the impact of disasters [57]. This study further delved into the disparities among specific impact topics. Individuals who experienced impacts on aspects such as ‘life & property loss’ and ‘work’ were more likely to express the emotion of disgust, while the emotion of sadness was also found to be significantly correlated with the impact topic of ‘life & property loss’. Additionally, it is noteworthy that the average fear score of impact-related texts was unexpectedly lower than unrelated texts. By screening the social media corpus, it was found that the emotion of fear, as depicted in the texts, was primarily attributed to the expressions of shock and fear exhibited by individuals when encountering extreme torrential rain and lightning phenomena, rather than being caused by the impact topics.

From the perspective of spatial analysis, previous studies have endeavored to determine the spatial pattern of disaster severity leveraging manually tagged geo-information or registration locations provided by users [20,48]. However, the availability of geo-tagged information was significantly limited and might not be sufficient to support disaster assessment, as only a negligible portion of users would actually tag their location when posting [13,15]. Meanwhile, the registration location information merely discloses the cities or counties where users reside, thereby primarily facilitating analysis at the administrative scale [34,58]. By utilizing Named Entity Recognition (NER) to directly extract location information from the microblog texts, this study demonstrates the significant application of social media in data-scarce scenarios for geo-tagged information. Thereby, the value of social media data without geo-tagged information can be enhanced. Additionally, by utilizing the extracted waterlogging spots, the spatial pattern of the affected areas can be mapped going beyond the limitation imposed by administrative divisions. Similarly, another study confirmed the utility of picture-based social media (e.g., Flickr) for rapid flood mapping [41]. Furthermore, this study extended previous research by investigating the spatial agglomeration effect of the affected areas and their correlation with various predisposing factors. The Moran’s I analysis revealed a significant agglomeration effect in the areas affected by the rainstorm, predominantly clustering in the geographically central regions of Guangzhou, where the most intense precipitation and historic central urban districts are located. Moreover, the results of the binary regression analysis demonstrated significant correlations between waterlogging occurrences and precipitation, population density, proportion of built-up surfaces, and road density. The findings indicate that the areas being impacted during the disaster were associated with both the intensity of the natural hazard and the degree of urbanization, which aligns with the argument presented in a previous study [59] in which the identification of potential affected areas should consider both the factors of natural hazards and the susceptibility of exposed sections.

5.2. Implications for Practice

Regarding the practical perspective, the present study provides empirical evidence for decision-makers to incorporate social media as an alternative data source for disaster management.

Initially, social media data can be utilized for rapid assessment and dynamic tracking of disaster impacts, thereby providing valuable information for enhancing disaster management. By harnessing the synergy between word frequency analysis and machine learning algorithms, this study confirmed the value of social media data to detect the perception of the public regarding the impacts of disasters, encompassing both tangible physical damage and disruptions to urban functions. Government agencies should adjust measures to local conditions and develop coping strategies according to the specific impact topics. From a temporal perspective, disaster management strategies can be categorized into three distinct phases: pre-disaster, during-disaster, and post-disaster [49]. In these phases, different disaster management measures are required. This study primarily focused on the during-disaster phase (the rainstorm day) and post-disaster phase (post-rainstorm stage). When confronted with sudden-onset disasters such as urban floods, government agencies inevitably face challenges due to limited manpower and material resources [35]. This study provides evidence for the effectiveness of employing social media to monitor public perception of incidents during and after disasters. Consequently, this approach can aid authorities and individuals in gaining a comprehensive understanding of rescue and recovery requirements, thereby facilitating the timely and appropriate implementation of actions in response.

In the context of disaster response and recovery, physical damage is frequently accompanied by emotional trauma, highlighting the importance of addressing individuals’ psychological needs comprehensively during and after disasters [11]. This study offers decision-makers a valuable perspective by demonstrating that the use of social media can not only facilitate the assessment of the intensity of negative sentiments but also enhance the comprehensive understanding of specific categories of negative sentiment (e.g., fear, disgust, and sadness) arising from disasters. Moreover, intriguing observations can be made regarding the correlation between public sentiment and the impacts of disasters perceived on social media. Although different impact topics displayed varying correlations with sentiments, it was observed that specific aspects of impacts (such as ‘life & property loss’ and ‘work’) could lead to more pronounced negative sentiments. Therefore, it is crucial to monitor sentiment when dealing with individuals who are affected by a disaster [54].

Timely identification of impacted areas is crucial for decision-makers in prioritizing rescue and recovery efforts. This study showcased the utility of the locations extracted from microblog texts to provide a clear understanding of the impacted areas. Unlike studies that rely on geo-tagged microblogs [20,48], the approach based on extracting locations from textual content is not limited by users’ willingness to disclose their geolocations. Moreover, the aforementioned significant correlations between waterlogging occurrences and several predisposing factors, including precipitation, proportion of built-up area, population density, and road density, can offer valuable insights into the key areas that are crucial for establishing disaster resilience. The findings highlighted the necessity to enhance the response capability of areas characterized by high built-up area proportion, population density, and road density, which tend to result in significant loss or impact on people’s daily lives [8]. In such areas, adaption resources, such as drainage networks, permeable surfaces, and roadbed subsidence, should be sufficiently prioritized to reduce the risks associated with waterlogging occurrences.

5.3. Limitations and Future Prospects

There are several limitations in this study. Firstly, in spite of the existence of numerous relevant studies demonstrating the application of social media in disaster management, ensuring the credibility and comprehensiveness of online information remains a challenge. It is inevitable that social media texts probably involve false or irrelevant information, affecting the accuracy of the data. Meanwhile, social media activities are associated with the demographic distribution of users; some people (e.g., the elderly and children) may not post their experiences and opinions on social media platforms, leading to the incompleteness of data. Therefore, more effective information extraction approaches need to be further explored. Moreover, this study focused on investigating the impact of urban flooding using textual content from the Sina Weibo social media platform, leading to limited information extraction for disaster situation awareness. In particular, this study identified impacted areas using microblog texts, which may result in the underestimation of waterlogging occurrence. However, there are various alternative data sources that can be utilized for disaster impact assessment as well. For instance, other popular social media platforms, such as Tik Tok (a popular short-video social networking service), can be involved to obtain more intuitive information pertaining to the disaster situation. Additionally, the data extracted from digital map apps (e.g., Baidu Map, Gaode Map) can be employed to provide traffic-related information during a disaster [60]. Consequently, future research can integrate multiple sources of data to gather a holistic understanding of the disaster situation [61].

Secondly, this study specifically focused on extracting information regarding the impact of urban flooding, and thereby primarily investigated the during-disaster stage and post-disaster stage of the rainstorm event, whereas the pre-disaster stage remains unexplored. Future research can expand the application of social media data involving all stages of the disaster process (including the pre-disaster stage, during-disaster stage, and post-disaster stage), investigating the evolution of social media topics, as well as comparing the disparities of public sentiment across different stages of a disaster.

6. Conclusions

The present study conducted an empirical investigation leveraging the disaster-related microblogs on the Weibo social media platform during the 2020 Guangzhou rainstorm event, contributing to both academic and practical advancements in utilizing social media data for enhancing disaster impact assessment. The main conclusions were drawn as follows. Initially, this study validated the potential of utilizing social media as a feasible approach to dynamically acquiring impact-related information, including tangible damage and interruptions in urban functions, during and after the rainstorm event. In this case, divergent situations of impact topics were discovered at different phases of the disaster. Regarding public sentiment, this study explored the correlation between the disaster impact and seven sentiment categories, indicating that people who experienced impacts related to ‘life & property loss’ and ‘work’ were more likely to express more significant negative sentiments. For the spatial analysis, this study showcased the significant utility of content-based location information derived from social media texts as a valuable supplementary source for identifying impacted areas. Furthermore, this study substantiated the correlations between the impacted area during urban flood disasters and multiple predisposing factors, including precipitation patterns, population density, the proportion of built-up surfaces, and road density.

Author Contributions

Conceptualization, S.L., J.H. and J.W.; methodology, S.L. and J.W.; software, S.L.; validation, S.L., J.H. and J.W.; formal analysis, S.L.; investigation, S.L.; resources, S.L.; data curation, S.L.; writing—original draft preparation, S.L.; writing—review and editing, S.L.; visualization, S.L.; supervision, J.H.; project administration, J.H.; funding acquisition, J.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

References

Klomp, J. Economic development and natural disasters: A satellite data analysis. Glob. Environ. Chang. 2016, 36, 67–88. [Google Scholar] [CrossRef]
McWethy, D.B.; Schoennagel, T.; Higuera, P.E.; Krawchuk, M.; Harvey, B.J.; Metcalf, E.C.; Schultz, C.; Miller, C.; Metcalf, A.L.; Buma, B.; et al. Rethinking resilience to wildfire. Nat. Sustain. 2019, 2, 797–804. [Google Scholar] [CrossRef]
Hammond, M.J.; Chen, A.S.; Djordjević, S.; Butler, D.; Mark, O. Urban flood impact assessment: A state-of-the-art review. Urban Water J. 2013, 12, 14–29. [Google Scholar] [CrossRef]
Song, J.; Chang, Z.; Li, W.; Feng, Z.; Wu, J.; Cao, Q.; Liu, J. Resilience-vulnerability balance to urban flooding: A case study in a densely populated coastal city in China. Cities 2019, 95, 102381. [Google Scholar] [CrossRef]
United Nations. World Urbanization Prospects: The 2018 Revision; United Nations: New York, NY, USA, 2018. [Google Scholar]
Kundzewicz, Z.W.; Kanae, S.; Seneviratne, S.I.; Handmer, J.; Nicholls, N.; Peduzzi, P.; Mechler, R.; Bouwer, L.M.; Arnell, N.; Mach, K.; et al. Flood risk and climate change: Global and regional perspectives. Hydrol. Sci. J. 2013, 59, 1–28. [Google Scholar] [CrossRef]
Liao, K.-H.; Le, T.A.; Nguyen, K.V. Urban design principles for flood resilience: Learning from the ecological wisdom of living with floods in the Vietnamese Mekong Delta. Landsc. Urban Plan. 2016, 155, 69–78. [Google Scholar] [CrossRef]
Wang, P.; Li, Y.; Zhang, Y. An urban system perspective on urban flood resilience using SEM: Evidence from Nanjing city, China. Nat. Hazards 2021, 109, 2575–2599. [Google Scholar] [CrossRef]
Liao, K.-H. A Theory on Urban Resilience to Floods—A Basis for Alternative Planning Practices. Ecol. Soc. 2012, 17, 4. [Google Scholar] [CrossRef]
Yu, Q.; Wang, Y.; Li, N. Extreme Flood Disasters: Comprehensive Impact and Assessment. Water 2022, 14, 1211. [Google Scholar] [CrossRef]
Shan, S.; Zhao, F.; Wei, Y.; Liu, M. Disaster management 2.0: A real-time disaster damage assessment model based on mobile social media data—A case study of Weibo (Chinese Twitter). Saf. Sci. 2019, 115, 393–413. [Google Scholar] [CrossRef]
Hao, H.; Wang, Y. Leveraging multimodal social media data for rapid disaster damage assessment. Int. J. Disaster Risk Reduct. 2020, 51, 101760. [Google Scholar] [CrossRef]
Li, L.; Bensi, M.; Cui, Q.; Baecher, G.B.; Huang, Y. Social media crowdsourcing for rapid damage assessment following a sudden-onset natural hazard event. Int. J. Inf. Manag. 2021, 60, 102378. [Google Scholar] [CrossRef]
Tan, L.; Schultz, D.M. Damage classification and recovery analysis of the Chongqing, China, floods of August 2020 based on social-media data. J. Clean. Prod. 2021, 313, 127882. [Google Scholar] [CrossRef]
Xing, Z.; Zhang, X.; Zan, X.; Xiao, C.; Li, B.; Han, K.; Liu, Z.; Liu, J. Crowdsourced social media and mobile phone signaling data for disaster impact assessment: A case study of the 8.8 Jiuzhaigou earthquake. Int. J. Disaster Risk Reduct. 2021, 58, 102200. [Google Scholar] [CrossRef]
Zhong, X.; Duckham, M.; Chong, D.; Tolhurst, K. Real-time estimation of wildfire perimeters from curated crowdsourcing. Sci. Rep. 2016, 6, 24206. [Google Scholar] [CrossRef]
Hu, Y. Geo-text data and data-driven geospatial semantics. Geogr. Compass 2018, 12, e12404. [Google Scholar] [CrossRef]
Wang, B.; Loo, B.P.Y.; Zhen, F.; Xi, G. Urban resilience from the lens of social media data: Responses to urban flooding in Nanjing, China. Cities 2020, 106, 102884. [Google Scholar] [CrossRef]
Restrepo-Estrada, C.; de Andrade, S.C.; Abe, N.; Fava, M.C.; Mendiondo, E.M.; de Albuquerque, J.P. Geo-social media as a proxy for hydrometeorological data for streamflow estimation and to improve flood monitoring. Comput. Geosci. 2018, 111, 148–158. [Google Scholar] [CrossRef]
Wu, W.; Li, J.; He, Z.; Ye, X.; Zhang, J.; Cao, X.; Qu, H. Tracking spatio-temporal variation of geo-tagged topics with social media in China: A case study of 2016 hefei rainstorm. Int. J. Disaster Risk Reduct. 2020, 50, 101737. [Google Scholar] [CrossRef]
Simon, T.; Goldberg, A.; Adini, B. Socializing in emergencies—A review of the use of social media in emergency situations. Int. J. Inf. Manag. 2015, 35, 609–619. [Google Scholar] [CrossRef]
Xiao, Y.; Huang, Q.; Wu, K. Understanding social media data for disaster management. Nat. Hazards 2015, 79, 1663–1679. [Google Scholar] [CrossRef]
Wang, Z.; Ye, X. Social media analytics for natural disaster management. Int. J. Geogr. Inf. Sci. 2017, 32, 49–72. [Google Scholar] [CrossRef]
Wu, D.; Cui, Y. Disaster early warning and damage assessment analysis using social media data and geo-location information. Decis. Support Syst. 2018, 111, 48–59. [Google Scholar] [CrossRef]
Kryvasheyeu, Y.; Chen, H.H.; Obradovich, N.; Moro, E.; Van Hentenryck, P.; Fowler, J.; Cebrian, M. Rapid assessment of disaster damage using social media activity. Sci. Adv. 2016, 2, e1500779. [Google Scholar] [CrossRef]
Alexander, D.E. Social media in disaster risk reduction and crisis management. Sci. Eng. Ethics 2014, 20, 717–733. [Google Scholar] [CrossRef]
Luna, S.; Pennock, M.J. Social media applications and emergency management: A literature review and research agenda. Int. J. Disaster Risk Reduct. 2018, 28, 565–577. [Google Scholar] [CrossRef]
Steiger, E.; de Albuquerque, J.P.; Zipf, A. An Advanced Systematic Literature Review on Spatiotemporal Analyses of Twitter Data. Trans. GIS 2015, 19, 809–834. [Google Scholar] [CrossRef]
Huang, Q.; Xiao, Y. Geographic Situational Awareness: Mining Tweets for Disaster Preparedness, Emergency Response, Impact, and Recovery. ISPRS Int. J. Geo-Inf. 2015, 4, 1549–1568. [Google Scholar] [CrossRef]
Houston, J.B.; Hawthorne, J.; Perreault, M.F.; Park, E.H.; Goldstein Hode, M.; Halliwell, M.R.; Turner McGowen, S.E.; Davis, R.; Vaid, S.; McElderry, J.A.; et al. Social media and disasters: A functional framework for social media use in disaster planning, response, and research. Disasters 2015, 39, 1–22. [Google Scholar] [CrossRef] [PubMed]
Guan, X.; Chen, C. Using social media data to understand and assess disasters. Nat. Hazards 2014, 74, 837–850. [Google Scholar] [CrossRef]
Zou, L.; Lam, N.S.N.; Cai, H.; Qiang, Y. Mining Twitter Data for Improved Understanding of Disaster Resilience. Ann. Am. Assoc. Geogr. 2018, 108, 1422–1441. [Google Scholar] [CrossRef]
Yuan, F.; Liu, R. Mining Social Media Data for Rapid Damage Assessment during Hurricane Matthew: Feasibility Study. J. Comput. Civ. Eng. 2020, 34, 05020001. [Google Scholar] [CrossRef]
Yuan, F.; Liu, R. Feasibility study of using crowdsourcing to identify critical affected areas for rapid damage assessment: Hurricane Matthew case study. Int. J. Disaster Risk Reduct. 2018, 28, 758–767. [Google Scholar] [CrossRef]
Wu, K.; Wu, J.; Ding, W.; Tang, R. Extracting disaster information based on Sina Weibo in China: A case study of the 2019 Typhoon Lekima. Int. J. Disaster Risk Reduct. 2021, 60, 102304. [Google Scholar] [CrossRef]
Wang, Y.; Taylor, J.E. Coupling sentiment and human mobility in natural disasters: A Twitter-based study of the 2014 South Napa Earthquake. Nat. Hazards 2018, 92, 907–925. [Google Scholar] [CrossRef]
Huang, Q.; Wong, D.W.S. Activity patterns, socioeconomic status and urban spatial structure: What can social media data tell us? Int. J. Geogr. Inf. Sci. 2016, 30, 1873–1898. [Google Scholar] [CrossRef]
Wang, Y.; Wang, T.; Ye, X.; Zhu, J.; Lee, J. Using Social Media for Emergency Response and Urban Sustainability: A Case Study of the 2012 Beijing Rainstorm. Sustainability 2015, 8, 25. [Google Scholar] [CrossRef]
Fohringer, J.; Dransch, D.; Kreibich, H.; Schröter, K. Social media as an information source for rapid flood inundation mapping. Nat. Hazards Earth Syst. Sci. 2015, 15, 2725–2738. [Google Scholar] [CrossRef]
Jongman, B.; Wagemaker, J.; Romero, B.; de Perez, E. Early Flood Detection for Rapid Humanitarian Response: Harnessing Near Real-Time Satellite and Twitter Signals. ISPRS Int. J. Geo-Inf. 2015, 4, 2246–2266. [Google Scholar] [CrossRef]
Rosser, J.F.; Leibovici, D.G.; Jackson, M.J. Rapid flood inundation mapping using social media, remote sensing and topographic data. Nat. Hazards 2017, 87, 103–120. [Google Scholar] [CrossRef]
Li, Z.; Wang, C.; Emrich, C.T.; Guo, D. A novel approach to leveraging social media for rapid flood mapping: A case study of the 2015 South Carolina floods. Cartogr. Geogr. Inf. Sci. 2017, 45, 97–110. [Google Scholar] [CrossRef]
Yang, T.; Xie, J.; Li, G.; Zhang, L.; Mou, N.; Wang, H.; Zhang, X.; Wang, X. Extracting Disaster-Related Location Information through Social Media to Assist Remote Sensing for Disaster Analysis: The Case of the Flood Disaster in the Yangtze River Basin in China in 2020. Remote Sens. 2022, 14, 1199. [Google Scholar] [CrossRef]
Eilander, D.; Trambauer, P.; Wagemaker, J.; van Loenen, A. Harvesting Social Media for Generation of Near Real-time Flood Maps. Procedia Eng. 2016, 154, 176–183. [Google Scholar] [CrossRef]
Dou, M.; Wang, Y.; Gu, Y.; Dong, S.; Qiao, M.; Deng, Y. Disaster damage assessment based on fine-grained topics in social media. Comput. Geosci. 2021, 156, 104893. [Google Scholar] [CrossRef]
Mihunov, V.V.; Jafari, N.H.; Wang, K.; Lam, N.S.N.; Govender, D. Disaster Impacts Surveillance from Social Media with Topic Modeling and Feature Extraction: Case of Hurricane Harvey. Int. J. Disaster Risk Sci. 2022, 13, 729–742. [Google Scholar] [CrossRef]
Rainey, J.L.; Brody, S.D.; Galloway, G.E.; Highfield, W.E. Assessment of the growing threat of urban flooding: A case study of a national survey. Urban Water J. 2021, 18, 375–381. [Google Scholar] [CrossRef]
Fang, J.; Hu, J.; Shi, X.; Zhao, L. Assessing disaster impacts and response using social media data in China: A case study of 2016 Wuhan rainstorm. Int. J. Disaster Risk Reduct. 2019, 34, 275–282. [Google Scholar] [CrossRef]
Kankanamge, N.; Yigitcanlar, T.; Goonetilleke, A.; Kamruzzaman, M. Determining disaster severity through social media analysis: Testing the methodology with South East Queensland Flood tweets. Int. J. Disaster Risk Reduct. 2020, 42, 101360. [Google Scholar] [CrossRef]
Chen, Z.; Lim, S. Social media data-based typhoon disaster assessment. Int. J. Disaster Risk Reduct. 2021, 64, 102482. [Google Scholar] [CrossRef]
Aoudi, S.; Malik, A. Lexicon Based Sentiment Comparison of iPhone and Android Tweets During the Iran-Iraq Earthquake. In Proceedings of the 5th International Conference on HCT Information Technology Trends (ITT) 2018, Dubai, United Arab Emirates, 28–29 November 2018; IEEE: Piscataway, NJ, USA, 2018. [Google Scholar]
Mendon, S.; Dutta, P.; Behl, A.; Lessmann, S. A Hybrid Approach of Machine Learning and Lexicons to Sentiment Analysis: Enhanced Insights from Twitter Data of Natural Disasters. Inf. Syst. Front. 2021, 23, 1145–1168. [Google Scholar] [CrossRef]
Ragini, J.R.; Anand, P.M.R.; Bhaskar, V. Big data analytics for disaster response and recovery through sentiment analysis. Int. J. Inf. Manag. 2018, 42, 13–24. [Google Scholar] [CrossRef]
Guo, D.; Zhao, Q.; Chen, Q.; Wu, J.; Li, L.; Gao, H. Comparison between sentiments of people from affected and non-affected regions after the flood. Geomat. Nat. Hazards Risk 2021, 12, 3346–3357. [Google Scholar] [CrossRef]
Zhang, S.; Wei, Z.; Wang, Y.; Liao, T. Sentiment analysis of Chinese micro-blog text based on extended sentiment dictionary. Future Gener. Comput. Syst. 2018, 81, 395–403. [Google Scholar] [CrossRef]
Xu, G.; Yu, Z.; Yao, H.; Li, F.; Meng, Y.; Wu, X. Chinese Text Sentiment Analysis Based on Extended Sentiment Dictionary. IEEE Access 2019, 7, 43749–43762. [Google Scholar] [CrossRef]
Zhang, T.; Cheng, C. Temporal and Spatial Evolution and Influencing Factors of Public Sentiment in Natural Disasters—A Case Study of Typhoon Haiyan. ISPRS Int. J. Geo-Inf. 2021, 10, 299. [Google Scholar] [CrossRef]
Shoyama, K.; Cui, Q.; Hanashima, M.; Sano, H.; Usuda, Y. Emergency flood detection using multiple information sources: Integrated analysis of natural hazard monitoring and social media data. Sci. Total Environ. 2021, 767, 144371. [Google Scholar] [CrossRef]
Versini, P.A.; Gaume, E.; Andrieu, H. Assessment of the susceptibility of roads to flooding based on geographical information—test in a flash flood prone area (the Gard region, France). Nat. Hazards Earth Syst. Sci. 2010, 10, 793–803. [Google Scholar] [CrossRef]
Guo, K.; Guan, M.; Yan, H. Utilising social media data to evaluate urban flood impact in data scarce cities. Int. J. Disaster Risk Reduct. 2023, 93, 103780. [Google Scholar] [CrossRef]
Meilutytė-Lukauskienė, D.; Akstinas, V.; Vaitulionytė, M.; Tomkevičienė, A. Behaviour of the 2010 flood in Lithuania: Management and socio-economic risks. Mitig. Adapt. Strateg. Glob. Chang. 2022, 27, 23. [Google Scholar]

Figure 1. The framework of data collection and data analysis.

Figure 2. Temporal variation of impact-related text numbers for each topic.

Figure 3. Temporal variation of impact topics in urban flood disasters within the rainstorm day.

Figure 4. Temporal variation of specific sentiment types.

Figure 5. (a) The distribution of waterlogging spots; (b) the spatial clustering pattern of impacted areas at 2 km scale.

Table 1. Performance comparison of classification models.

Model	Indicator	Impact Topic
Model	Indicator	Life & Property Loss	Traffic	Work	Education	Infrastructure Failure
SVM	Accuracy	0.945	0.967	0.941	0.986	0.871
	Precision	0.946	0.968	0.942	0.985	0.878
	Recall	0.946	0.967	0.942	0.986	0.869
	F1	0.946	0.966	0.941	0.985	0.869
NB	Accuracy	0.858	0.825	0.846	0.841	0.806
	Precision	0.859	0.828	0.851	0.868	0.827
	Recall	0.858	0.824	0.844	0.835	0.81
	F1	0.858	0.824	0.845	0.836	0.805

Table 2. Data sets involved in this study.

Data	Format	Source
Social Media Data	Text	Sina Weibo Platform
Daily Precipitation	Text	Water Resources Department of Guangdong Province
Topography Data	Raster	NASA DEM
LULC Data	Raster	Institute of Geographic Sciences and Natural Resources Research
Demographic data	Raster	Resource and Environmental Science and Data center
Road Network	Shapefile	OSM Platform

Table 3. Multi-collinearity diagnostics results.

Factor	VIF	Tolerance
Elevation	3.624	0.276
Precipitation	1.37	0.73
Population Density	1.639	0.61
NDVI	2.8	0.357
Slope	3.981	0.251
Curvature	1.017	0.983
Proportion of Built-up Surface	2.427	0.412
Proportion of Water Surface	1.129	0.886
Road Density	1.543	0.648

Table 4. Five topics of disaster impact and top five keywords.

Topic	Top Five Keywords
Life & property loss	Vehicle, Casualties, Sweep away, Trapped, Building,
Traffic	Subway, Travel, Driving, Service suspension, Traffic
Work	Office, Company, Work, Being late, Off duty
Education	Class suspension, School, University, Exam, Study
Infrastructure Failure	Power Outage, Signal, Water Outage, Power Supply, Power Usage

Table 5. Number and proportion of texts representing each topic.

	Total	Rainstorm Day	Post-Rainstorm
Traffic	1807	1117	690
Work	691	521	170
Education	238	197	41
Infrastructure	130	85	45
Life & Property Loss	1527	588	939

Table 6. Sentiment scores and proportions for specific sentiment types at different stages.

	Rainstorm Day		Post Rainstorm Stage
Good	36.895	28.68%	27.906	37.29%
Joy	17.533	13.63%	16.453	21.98%
Surprise	6.302	4.90%	1.128	1.51%
Disgust	27.365	21.27%	13.470	18.00%
Fear	28.383	22.07%	7.299	9.75%
Sadness	11.218	8.72%	8.086	10.80%
Anger	0.931	0.72%	0.499	0.67%

Table 7. T-test for sentiment comparison between impact-related texts and unrelated texts.

Impact Topic	Life & Property Loss		Traffic		Work		Education		Infrastructure Failure
Sentiment Category	Mean Difference	p	Mean Difference	p	Mean Difference	p	Mean Difference	p	Mean Difference	p
Good	−0.16	0.234	−0.18	0.155	−0.38	0.005 **	−0.4	0.043 *	−0.43	0.043 *
Disgust	0.34	0.005 **	0.06	0.485	0.75	0.000 **	−0.06	0.429	−0.11	0.417
Sadness	0.48	0.000 **	−0.07	0.253	0	0.984	0.08	0.462	0.01	0.916
Surprise	−0.1	0.142	−0.12	0.093	−0.1	0.097	−0.16	0.204	−0.17	0.082
Fear	−0.34	0.000 **	−0.43	0.000 **	−0.48	0.000 **	−0.48	0.000 **	−0.66	0.000 **
Anger	−0.01	0.372	−0.02	0.108	−0.02	0.062	−0.03	0.308	0	0.795
Joy	−0.41	0.000 **	−0.47	0.000 **	−0.22	0.05	0	0.977	−0.65	0.000 **

Notes: * p < 0.05 ** p < 0.01.

Table 8. Number and proportion of waterlogging spots in each district.

District	Number of Waterlogging Spots	Proportion
Huangpu	81	35.5%
Zengcheng	41	17.8%
Tianhe	46	21.5%
Panyu	10	4.1%
Liwan	3	1.2%
Haizhu	15	7.0%
Huadu	4	1.7%
Baiyun	19	8.3%
Conghua	1	1.2%
Yuexiu	2	1.2%
Nansha	1	0.4%

Table 9. Examples of different waterlogging spot types.

Waterlogging Spot Type	Waterlogging Spot Examples
Traffic-related area spot	Guanhu Metro Station, Nangang Bus Terminal, Guangyuan Highway, Kaiyuan Avenue, Bandao Tunnel
Residential area spot	Fengle Gated Neighborhood, Haiyuan Road Gated Neighborhood, Tangde Gated Neighborhood, Bishan Urban Village, Mupi Urban Village, Huangma Urban Village, Shuinan Urban Village
Public area spot	Tianhe Park, Daguan School, Fuyi Square, Xintang Square, Shunxin Shopping Mall

Table 10. Global Moran’s I of waterlogging spots at the analysis scales of 1 km, 2 km, and 3 km.

	Value
Moran’s I	0.404
z-value	27.459
p-value	0.000

Table 11. Binary logistic regression results for predisposing factors affecting waterlogging occurrences.

	Estimate	Std.E	z	p
Precipitation	5.873	0.608	9.652	0.000 **
Elevation	−3.147	4.015	−0.784	0.433
Slope	2.589	1.939	1.335	0.182
NDVI	−1.673	1.427	−1.172	0.241
Water Surface	0.498	1.001	0.498	0.619
Population Density	3.56	0.871	4.089	0.000 **
Built-up Surfaces	4.92	0.8	6.15	0.000 **
Road Density	1.956	0.766	2.554	0.011 *
(Intercept)	−10.511	1.407	−7.469	0.000 **

Notes: *: p < 0.05; **: p < 0.01.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lu, S.; Huang, J.; Wu, J. Multi-Dimensional Urban Flooding Impact Assessment Leveraging Social Media Data: A Case Study of the 2020 Guangzhou Rainstorm. Water 2023, 15, 4296. https://doi.org/10.3390/w15244296

AMA Style

Lu S, Huang J, Wu J. Multi-Dimensional Urban Flooding Impact Assessment Leveraging Social Media Data: A Case Study of the 2020 Guangzhou Rainstorm. Water. 2023; 15(24):4296. https://doi.org/10.3390/w15244296

Chicago/Turabian Style

Lu, Shuang, Jianyun Huang, and Jing Wu. 2023. "Multi-Dimensional Urban Flooding Impact Assessment Leveraging Social Media Data: A Case Study of the 2020 Guangzhou Rainstorm" Water 15, no. 24: 4296. https://doi.org/10.3390/w15244296

APA Style

Lu, S., Huang, J., & Wu, J. (2023). Multi-Dimensional Urban Flooding Impact Assessment Leveraging Social Media Data: A Case Study of the 2020 Guangzhou Rainstorm. Water, 15(24), 4296. https://doi.org/10.3390/w15244296

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Multi-Dimensional Urban Flooding Impact Assessment Leveraging Social Media Data: A Case Study of the 2020 Guangzhou Rainstorm

Abstract

1. Introduction

2. Case Introduction

3. Data and Methods

3.1. Data Collection and Pre-Processing

3.2. Framework of Multi-Dimensional Urban Flooding Impact Assessment

3.2.1. Step 1: Impact Topic Extraction and Quantification

3.2.2. Step 2: Analysis of Public Sentiment Responding to the Disaster

3.2.3. Step 3: Analysis of Spatial Distribution Patterns of Impacted Areas

4. Results

4.1. Impact Assessment

4.1.1. Impact Topic Classification and Quantification

4.1.2. Temporal Analysis of Impact Topics

4.2. Sentiment Analysis

4.2.1. Public Sentiment Responding to the Disaster

4.2.2. The Relationship between Disaster Impacts and Social Media Sentiment

4.3. Spatial Analysis

4.3.1. Identification of Waterlogging Spots

4.3.2. Spatial Pattern of Impacted Areas

4.3.3. Influencing Factors on the Impacted Areas from Social Media

5. Discussion

5.1. Key Findings and Reflections

5.2. Implications for Practice

5.3. Limitations and Future Prospects

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI