Spatiotemporal Changes of Urban Rainstorm-Related Micro-Blogging Activities in Response to Rainstorms: A Case Study in Beijing, China

This paper examines rainstorm-related micro-blogging activities in response to rainstorms in an urban environment at ﬁne spatial and temporal scales. Results could be used in supporting disaster assessment and mitigation decision making. Abstract: Natural disasters cause signiﬁcant casualties and losses in urban areas every year. Further, the frequency and intensity of natural disasters have increased signiﬁcantly over the past couple of decades in the context of global climate change. Understanding how urban dwellers learn about and response to a natural hazard is of great signiﬁcance as more and more people migrate to cities. Social media has become one of the most essential communication platforms in the virtual space for users to share their knowledge, information, and opinions about almost everything in the physical world. Geo-tagged posts published on di ﬀ erent social media platforms contain a huge amount of information that can help us to better understand the dynamics of collective geo-tagged human activities. In this study, we investigated the spatiotemporal distribution patterns of the collective geo-tagged human activities in Beijing when it was a ﬄ icted by the “6-22” rainstorm. We used a variety of machine learning and statistical methods to examine the correlations between rainstorm-related microblogs and the rainstorm characteristics at a ﬁne spatial and a ﬁne temporal scale across Beijing. We also studied factors that could be used to explain the changes of the rainstorm-related blogging activities. Our results show that the human response to a disaster is very consistent, though with certain time lags, in the virtual and physical spaces at both the grid and city scales. Such a consistency varies signiﬁcantly across our study area.


Introduction
Natural disasters such as hurricanes, floods and tornadoes can cause significant life losses, property damages, and even political instability [1][2][3][4]. In the context of global climate change, natural disasters have become more frequent and pose increasing physical, social, and economic threats to human society [5,6]. Closely monitoring how a natural disaster disturbs human activities is thus of great value [7][8][9], particularly in an urban environment where human-environment relationships are usually much more complicated.

Study Area
This study examined Weibo users' blogging activities in response to the rainstorms that hit Beijing from 21 to 24 June 2017, we chose the main urban area of Beijing as the study area as shown in Figure 1 (The water ponding points and major hubs in the main urban area have been marked in the Figure 1). Beijing is the capital of China. It significant grew over the past decade. The percent of the developed land in Beijing increases from 7.9% in 2005 to 16.3% in 2015. The total population has increased from 1538 million in 2005 to 2170.5 million in 2015. Beijing is also one of the most economically active cities in China with a total GDP of $975 billion in 2005 to $3270 billion in 2015.
Dwellers in Beijing use a variety of social media platforms on a daily base. Sina Weibo is one of the most popular platforms that allow people to stay in touch and share each other information about any on-going events. As of 2018, Sina Weibo has 462 million active users in China and every day an average 1.30 million words are posted online through it [42].

Data
A total of 3.32 million Weibo blogs geotagged with Beijing were crawled from the Sina Weibo platform. All posts were published from June to September 2017, a period that Beijing receives most of the rains all over the year. Every blog comes with the information of its user, publishing location, publishing time, and texts.
We also collected data of rainfall amount, points of interest (POIs), and water ponding sites. We used two precipitation data sets in this study. The hourly precipitation data at meteorological stations were collected from the China Meteorological Data Network (http://data.cma.cn/). The 1-h cumulative precipitation dataset was generated from the meteorological radar in Daxing, a town located 13 km south of the city. The radar covers the entire Beijing area and provides precipitation data with a spatial resolution of 1.051 × 1.051 km (After image processing and registration processing). In this paper, we unified the study of grid scale to this resolution, and the other data sources used are also processed to this resolution for further analysis and processing.
The POIs include the locations of businesses, educational institutions, residential areas, transportation facilities, open spaces, and others. The data set was produced mainly for navigation by the Beijing NavInfo Co., Ltd, Beijing, 110000. Each POI comes with its coordinates (latitude and longitude), type, name, address and flag (show its importance level). In this paper, we categorized the POIs into five classes (Table 1). 3. Methods Figure 2 shows our data processing and analysis processes. We first used ArcGIS 10.5 to aggregate geotagged Weibo posts to grids with the same resolution (1.051 × 1.051 km) as of the rainfall data. We used the support vector machine (SVM) model to classify and extract rainstorm-related microblogs. We then analyzed how city dwellers respond to the rainstorms in terms of the changes

Data
A total of 3.32 million Weibo blogs geotagged with Beijing were crawled from the Sina Weibo platform. All posts were published from June to September 2017, a period that Beijing receives most of the rains all over the year. Every blog comes with the information of its user, publishing location, publishing time, and texts.
We also collected data of rainfall amount, points of interest (POIs), and water ponding sites. We used two precipitation data sets in this study. The hourly precipitation data at meteorological stations were collected from the China Meteorological Data Network (http://data.cma.cn/). The 1-h cumulative precipitation dataset was generated from the meteorological radar in Daxing, a town located 13 km south of the city. The radar covers the entire Beijing area and provides precipitation data with a spatial resolution of 1.051 × 1.051 km (After image processing and registration processing). In this paper, we unified the study of grid scale to this resolution, and the other data sources used are also processed to this resolution for further analysis and processing.
The POIs include the locations of businesses, educational institutions, residential areas, transportation facilities, open spaces, and others. The data set was produced mainly for navigation by the Beijing NavInfo Co., Ltd, Beijing, 110000. Each POI comes with its coordinates (latitude and longitude), type, name, address and flag (show its importance level). In this paper, we categorized the POIs into five classes (Table 1). 3. Methods Figure 2 shows our data processing and analysis processes. We first used ArcGIS 10.5 to aggregate geotagged Weibo posts to grids with the same resolution (1.051 × 1.051 km) as of the rainfall data. We used the support vector machine (SVM) model to classify and extract rainstorm-related microblogs. We then analyzed how city dwellers respond to the rainstorms in terms of the changes of the numbers of rainstorm-related microblogs at the grid scale and city scale. Finally, we investigated the factors that could be used to explain the city dwellers' response to the rainstorms.
Appl. Sci. 2019, 9, x FOR PEER REVIEW 4 of 16 of the numbers of rainstorm-related microblogs at the grid scale and city scale. Finally, we investigated the factors that could be used to explain the city dwellers' response to the rainstorms.

Extraction of Rainstorm-Related Weibo Posts
In total we crawled 3.32 million Weibo posts that were published from June to September in 2017 and geo-tagged with city Beijing. We then used the keywords such as thunderbolt, storm, water, and rainfall to filter and found around 8000 posts that are possibly related to the rainstorms.
We then randomly selected 2000 out of the 8000 posts and manually checked each post. The post was labeled with "true" if it is truly related to a rainstorm otherwise "false" if it is not. The 2000 manually labeled posts were then evenly divided into two subsets, which were used to train the SVM classifier and validate the classification results, respectively.
The SVM classifier has been used in previous studies to label the microblogs either as eventrelated or event-independent [43,44]. It is a nonlinear classifier that was generated using the radial basis function (RBF). Essentially, it produces an optimal hyperplane that can best separate the rainstorm-related posts from those none-related. The hyperplane is defined by two parameters, C and gamma, which represent the influencing range of a single sample and the influencing degree of the support vector, respectively. The two parameters were calculated using the GridSearchCV method [45] based on the training data subset. The validation subset data was then used to evaluate the separation accuracy using the five-fold cross validation method [46]. In this study, we obtained an F-score of 0.85, which indicates that the SVM classifier could be used to identify the trulyrainstorm-related Weibo posts. The final SVM model was then used to examine all unlabeled Weibo posts. In total we found 6072 out of the 8000 posts were truly rainstorm-related during the period of June to September 2017.

Weibo Blogging Index
We used two indexes to measure the blogging activities in response to the rainstorms. The first index, the human's event response index (HERI), is defined as the ratio between the standardized number of the rainstorm-related Weibo posts (RRWP) to the standardized total number of the posts

Extraction of Rainstorm-Related Weibo Posts
In total we crawled 3.32 million Weibo posts that were published from June to September in 2017 and geo-tagged with city Beijing. We then used the keywords such as thunderbolt, storm, water, and rainfall to filter and found around 8000 posts that are possibly related to the rainstorms.
We then randomly selected 2000 out of the 8000 posts and manually checked each post. The post was labeled with "true" if it is truly related to a rainstorm otherwise "false" if it is not. The 2000 manually labeled posts were then evenly divided into two subsets, which were used to train the SVM classifier and validate the classification results, respectively.
The SVM classifier has been used in previous studies to label the microblogs either as event-related or event-independent [18,43]. It is a nonlinear classifier that was generated using the radial basis function (RBF). Essentially, it produces an optimal hyperplane that can best separate the rainstorm-related posts from those none-related. The hyperplane is defined by two parameters, C and gamma, which represent the influencing range of a single sample and the influencing degree of the support vector, respectively. The two parameters were calculated using the GridSearchCV method [44] based on the training data subset. The validation subset data was then used to evaluate the separation accuracy using the five-fold cross validation method [45]. In this study, we obtained an F-score of 0.85, which indicates that the SVM classifier could be used to identify the truly-rainstorm-related Weibo posts. The final SVM model was then used to examine all unlabeled Weibo posts. In total we found 6072 out of the 8000 posts were truly rainstorm-related during the period of June to September 2017.

Weibo Blogging Index
We used two indexes to measure the blogging activities in response to the rainstorms. The first index, the human's event response index (HERI), is defined as the ratio between the standardized number of the rainstorm-related Weibo posts (RRWP) to the standardized total number of the posts within a specific cell (In the data standardization process, we normalize the total number of Weibo posts and RRWP to 0-1 for each grid.).

HERI =
Standardized number of the RRWP Standardized total number of Weibo Posts (1) The HERI could be used to measure human response intensity. A higher HERI value would indicate city dwellers are more active in blogging the rainstorms. A very similar index was used to estimate hazard-induced damages and monitor the post-hazard recovery speed [40,46].
The HERI could be significantly affected by the rainfall amount. Thus, we used another index, the event normalized response relation (ENRR), to evaluate the human response to a rainstorm by eliminating the bias introduced by the variations in rainfall amount. The ENRR is expressed as the relationship between the HERI and the rainfall levels per cell. Both the HERI and rainfall amount values were first broken into three levels (high, medium, and low) using the Jenks Natural Breaks classification method, which clusters data into different classes by seeking to reduce the variance within a class and maximize the variance between classes. The different combinations of the three HERI and rainfall levels would reflect how dwellers response to a rainstorm which brings different rainfall amount across our study area. Figure 3 shows the nine relationships represented by ENRR. In the study, we mapped the relationship of ENRR to each grid to reflect the relationship between HERI and rainfall intensity in different regions.
Appl. Sci. 2019, 9, x FOR PEER REVIEW 5 of 16 within a specific cell (In the data standardization process, we normalize the total number of Weibo posts and RRWP to 0-1 for each grid.).

= Standardized number of the RRWP Standardized total number of Weibo Posts
(1) The HERI could be used to measure human response intensity. A higher HERI value would indicate city dwellers are more active in blogging the rainstorms. A very similar index was used to estimate hazard-induced damages and monitor the post-hazard recovery speed [40,47].
The HERI could be significantly affected by the rainfall amount. Thus, we used another index, the event normalized response relation (ENRR), to evaluate the human response to a rainstorm by eliminating the bias introduced by the variations in rainfall amount. The ENRR is expressed as the relationship between the HERI and the rainfall levels per cell. Both the HERI and rainfall amount values were first broken into three levels (high, medium, and low) using the Jenks Natural Breaks classification method, which clusters data into different classes by seeking to reduce the variance within a class and maximize the variance between classes. The different combinations of the three HERI and rainfall levels would reflect how dwellers response to a rainstorm which brings different rainfall amount across our study area. Figure 3 shows the nine relationships represented by ENRR. In the study, we mapped the relationship of ENRR to each grid to reflect the relationship between HERI and rainfall intensity in different regions.

Statistical Analysis
We used a variety of conventional and spatial statistics methods to evaluate the areal difference of the blogging activities in response to the rainstorms across our study area. The hourly rainfall and the corresponding hourly RRWP were separately divided into four different groups according to their quartile levels, from which a confusion matrix was constructed. We then used the weighted Kappa coefficient to evaluate the consistency of the relationship between different levels of rainfall and the RRWP.
Quantile regression was used to estimate the conditional quantiles (0.05, 0.25, 0.50, 0.75, 0.9, and 1) of the number of posts in response to certain rainfall amount by measuring their central tendency and statistical dispersion. Quantile regression could more accurately describe the variation range of the dependent variable in response to the dependent variable. We then used the receiver operating characteristic (ROC) curves [48,49] to obtain the range within which the water ponding sites and major transportation hubs affect the blogging activities in response to the rainstorm. An optimal threshold is obtained by weighting both the sensitivity and the specificity equally, as measured by the closest distance between the points along the ROC curves and the top-left point, i.e., the perfect classification where the sensitivity and specificity both equal to 1. In addition, we also performed hotspot analysis based on the HERI and the ENRR.

Statistical Analysis
We used a variety of conventional and spatial statistics methods to evaluate the areal difference of the blogging activities in response to the rainstorms across our study area. The hourly rainfall and the corresponding hourly RRWP were separately divided into four different groups according to their quartile levels, from which a confusion matrix was constructed. We then used the weighted Kappa coefficient to evaluate the consistency of the relationship between different levels of rainfall and the RRWP.
Quantile regression was used to estimate the conditional quantiles (0.05, 0.25, 0.50, 0.75, 0.9, and 1) of the number of posts in response to certain rainfall amount by measuring their central tendency and statistical dispersion. Quantile regression could more accurately describe the variation range of the dependent variable in response to the dependent variable. We then used the receiver operating characteristic (ROC) curves [47,48] to obtain the range within which the water ponding sites and major transportation hubs affect the blogging activities in response to the rainstorm. An optimal threshold is obtained by weighting both the sensitivity and the specificity equally, as measured by the closest distance between the points along the ROC curves and the top-left point, i.e., the perfect classification where the sensitivity and specificity both equal to 1. In addition, we also performed hotspot analysis based on the HERI and the ENRR.
In this paper, we examined the blogging activities in response to the rainstorms at both city and grid levels, respectively. The city extent is defined by the administrative boundary of Beijing. Within the city, the rainfall and Weibo posts were aggregated to individual grids of 1.051 km × 1.051 km. There are 2776 grids within our study area, covering a total area of 2749.23 km 2 .

Analysis and Results
Five heavy rainstorms hit Beijing on 22 June, 6 July, 20 July, 2 August, and 22 August (Figure 4). The 22 June rainstorm brought historical record precipitation, flooded the city, and caused significant economic losses. When the city was afflicted by the 22 June rainstorm, Weibo users posted over 1000 blogs, the maximum blog number among all rainstorm events that hit Beijing in summer 2017. In this study, we mainly focus on the blogging activities in response to the 22 June rainstorm.
Appl. Sci. 2019, 9, x FOR PEER REVIEW 6 of 16 In this paper, we examined the blogging activities in response to the rainstorms at both city and grid levels, respectively. The city extent is defined by the administrative boundary of Beijing. Within the city, the rainfall and Weibo posts were aggregated to individual grids of 1.051 km × 1.051 km. There are 2776 grids within our study area, covering a total area of 2749.23 km 2 .

Analysis and Results
Five heavy rainstorms hit Beijing on 22 June, 6 July, 20 July, 2 August, and 22 August (Figure 4). The 22 June rainstorm brought historical record precipitation, flooded the city, and caused significant economic losses. When the city was afflicted by the 22 June rainstorm, Weibo users posted over 1000 blogs, the maximum blog number among all rainstorm events that hit Beijing in summer 2017. In this study, we mainly focus on the blogging activities in response to the 22 June rainstorm.   Figure 6 shows the time series hourly rainfall and the hourly number of RRWP from 20 to 28 June. No RRWP was found on the social media platform before the rainstorm hit the city. Rainstormrelated blogging activities were first detected when the first rainstorm warning was issued at 4:20 p.m. on 21 June. The blogging activities significantly intensified, particularly when the rainstorm is most intensive during the time period from 19:00 on 21 June to 04:00 on 24 June. During this time period, the city rainfall amount accounts for 90.5% of the total rainfall brought to Beijing by the "6.22" rainstorm. About 91.6% of the RRWP was posted during this time period. Figure 6 also shows that variations in the rainfall amount are generally consistent with the blogging activities though there seems to be a 1-h time lag. The blogging activities are most intense in about 10 min before the release of the rainstorm warning. A high rainfall amount is not always accompanied by strong blogging activities, particularly when raining occurs from the late night to the early morning and the rainstorm hits suburbs with a small population flow.   Figure 6 shows the time series hourly rainfall and the hourly number of RRWP from 20 to 28 June. No RRWP was found on the social media platform before the rainstorm hit the city. Rainstorm-related blogging activities were first detected when the first rainstorm warning was issued at 4:20 p.m. on 21 June. The blogging activities significantly intensified, particularly when the rainstorm is most intensive during the time period from 19:00 on 21 June to 04:00 on 24 June. During this time period, the city rainfall amount accounts for 90.5% of the total rainfall brought to Beijing by the "6.22" rainstorm. About 91.6% of the RRWP was posted during this time period. Figure 6 also shows that variations in the rainfall amount are generally consistent with the blogging activities though there seems to be a 1-h time lag. The blogging activities are most intense in about 10 min before the release of the rainstorm warning. A high rainfall amount is not always accompanied by strong blogging activities, particularly when raining occurs from the late night to the early morning and the rainstorm hits suburbs with a small population flow.  Figure 6 shows the time series hourly rainfall and the hourly number of RRWP from 20 to 28 June. No RRWP was found on the social media platform before the rainstorm hit the city. Rainstormrelated blogging activities were first detected when the first rainstorm warning was issued at 4:20 p.m. on 21 June. The blogging activities significantly intensified, particularly when the rainstorm is most intensive during the time period from 19:00 on 21 June to 04:00 on 24 June. During this time period, the city rainfall amount accounts for 90.5% of the total rainfall brought to Beijing by the "6.22" rainstorm. About 91.6% of the RRWP was posted during this time period. Figure 6 also shows that variations in the rainfall amount are generally consistent with the blogging activities though there seems to be a 1-h time lag. The blogging activities are most intense in about 10 min before the release of the rainstorm warning. A high rainfall amount is not always accompanied by strong blogging activities, particularly when raining occurs from the late night to the early morning and the rainstorm hits suburbs with a small population flow.  The confusion matrix (Figure 7) between the different levels of rainfall amount and the RRWP shows that higher rainfall levels are always associated with more RRWP. We found 38 h with higher rainfall amount and more RRWP. Lower rainfall levels are associated with fewer RRWP. A statistically significant weighted Kappa coefficient of 0.63 indicates that the levels of blogging activities are consistent with the rainfall levels across the city.

Blogging Activities at City Level
Appl. Sci. 2019, 9, x FOR PEER REVIEW 8 of 16 The confusion matrix (Figure 7) between the different levels of rainfall amount and the RRWP shows that higher rainfall levels are always associated with more RRWP. We found 38 h with higher rainfall amount and more RRWP. Lower rainfall levels are associated with fewer RRWP. A statistically significant weighted Kappa coefficient of 0.63 indicates that the levels of blogging activities are consistent with the rainfall levels across the city.  Figure 8 shows the correlation coefficients between the number of RRWP and rainfall amount with a time lag up to 6 h. With increased time lags, the coefficients drop though the correlations are statistically significant at a confidence level of 0.01. The highest coefficient 0.653 was found when the time lag is 1 h, suggesting that more RRWP were posted one hour after the rainstorm. In other words, heavy rainfall usually triggers intensified blogging activities one hour later. It seems that, after 1 h of the rainstorm, the city starts to be afflicted by issues such as waterlogging and traffic congestion. Such issues tend to intensify rainstorm-related blogging activities.   Figure 8 shows the correlation coefficients between the number of RRWP and rainfall amount with a time lag up to 6 h. With increased time lags, the coefficients drop though the correlations are statistically significant at a confidence level of 0.01. The highest coefficient 0.653 was found when the time lag is 1 h, suggesting that more RRWP were posted one hour after the rainstorm. In other words, heavy rainfall usually triggers intensified blogging activities one hour later. It seems that, after 1 h of the rainstorm, the city starts to be afflicted by issues such as waterlogging and traffic congestion. Such issues tend to intensify rainstorm-related blogging activities.
Appl. Sci. 2019, 9, x FOR PEER REVIEW 8 of 16 The confusion matrix (Figure 7) between the different levels of rainfall amount and the RRWP shows that higher rainfall levels are always associated with more RRWP. We found 38 h with higher rainfall amount and more RRWP. Lower rainfall levels are associated with fewer RRWP. A statistically significant weighted Kappa coefficient of 0.63 indicates that the levels of blogging activities are consistent with the rainfall levels across the city.  Figure 8 shows the correlation coefficients between the number of RRWP and rainfall amount with a time lag up to 6 h. With increased time lags, the coefficients drop though the correlations are statistically significant at a confidence level of 0.01. The highest coefficient 0.653 was found when the time lag is 1 h, suggesting that more RRWP were posted one hour after the rainstorm. In other words, heavy rainfall usually triggers intensified blogging activities one hour later. It seems that, after 1 h of the rainstorm, the city starts to be afflicted by issues such as waterlogging and traffic congestion. Such issues tend to intensify rainstorm-related blogging activities.  Quantile regression analysis between the rainfall amount and the number of RRWP (Figure 9) shows a steeper slope for the higher percentile data. In other words, increased rainfall shows a more significant impact on the number of RRWP when the rainfall is heavier. By contrast, when the rainfall is less than 30 th percentile (the average rainfall of the grid in the study area is 8.6), an increase in the rainfall amount shows little impacts on the change of the RRWP. Once the rainfall exceeds the 30 th percentile, the RRWP starts to increase. As the rainfall percentile increases, the regression slope becomes steeper. In other words, once the rainfall is over the 30 th percentile, it tends to trigger Weibo users to post much more RRWP.
Appl. Sci. 2019, 9, x FOR PEER REVIEW 9 of 16 Quantile regression analysis between the rainfall amount and the number of RRWP ( Figure 9) shows a steeper slope for the higher percentile data. In other words, increased rainfall shows a more significant impact on the number of RRWP when the rainfall is heavier. By contrast, when the rainfall is less than 30 th percentile (the average rainfall of the grid in the study area is 8.6), an increase in the rainfall amount shows little impacts on the change of the RRWP. Once the rainfall exceeds the 30 th percentile, the RRWP starts to increase. As the rainfall percentile increases, the regression slope becomes steeper. In other words, once the rainfall is over the 30 th percentile, it tends to trigger Weibo users to post much more RRWP.

Human Response at Grid Scale
In order to explore the differences in human response intensity of different time periods at grid scale, we first divided the whole day into four time periods (08-10, 11-16, 17-20, 21-07), and then map the RRWPs in different time periods by the 4 time periods' dot maps. Figure 10 shows the results. We can find that in the study area, the morning rush hours (08-10) and the evening rush hours (17)(18)(19)(20) have the strongest human response intensity. During these two periods, important transportation hubs (Commercial business center, large jobs-housing area) and water ponding point areas have become regions with a high response in the main urban area of Beijing. In addition, there was a phenomenon in which dense points are distributed around the subway and along important roads. The occurrence of rainstorm event has caused great obstacles to traffic operation and delayed human's travel. The points in the second period (11)(12)(13)(14)(15)(16) are mainly distributed near the traffic station and the more severely affected areas. The points in the last period are sparsely distributed in the study area. We also found the density distribution of four periods' points in major traffic stations such as airports, railway stations, and bus stations were relatively uniform, while large jobs-housing areas are densely distributed at points of the morning rush hours and the evening rush hours. Figure 11 shows the correlation between the rainfall amount and the number of RRWP at the grid scale. The correlation coefficients vary between −0.14 and 0.86 with an average of 0.22. The negative or no correlation relationship is mainly found in suburbs, such as the Changping, Huairou and Miyun Districts. By contrast, higher correlation coefficients are mainly found in populated areas within the city, including populated residential communities, important transportation hubs, and areas significantly impacted by the rainstorms as shown in news reports.

Human Response at Grid Scale
In order to explore the differences in human response intensity of different time periods at grid scale, we first divided the whole day into four time periods (08-10, 11-16, 17-20, 21-07), and then map the RRWPs in different time periods by the 4 time periods' dot maps. Figure 10 shows the results. We can find that in the study area, the morning rush hours (08-10) and the evening rush hours (17)(18)(19)(20) have the strongest human response intensity. During these two periods, important transportation hubs (Commercial business center, large jobs-housing area) and water ponding point areas have become regions with a high response in the main urban area of Beijing. In addition, there was a phenomenon in which dense points are distributed around the subway and along important roads. The occurrence of rainstorm event has caused great obstacles to traffic operation and delayed human's travel. The points in the second period (11)(12)(13)(14)(15)(16) are mainly distributed near the traffic station and the more severely affected areas. The points in the last period are sparsely distributed in the study area. We also found the density distribution of four periods' points in major traffic stations such as airports, railway stations, and bus stations were relatively uniform, while large jobs-housing areas are densely distributed at points of the morning rush hours and the evening rush hours. Figure 11 shows the correlation between the rainfall amount and the number of RRWP at the grid scale. The correlation coefficients vary between −0.14 and 0.86 with an average of 0.22. The negative or no correlation relationship is mainly found in suburbs, such as the Changping, Huairou and Miyun Districts. By contrast, higher correlation coefficients are mainly found in populated areas within the city, including populated residential communities, important transportation hubs, and areas significantly impacted by the rainstorms as shown in news reports.   Figure 12a shows the HERI across our study area. Only the 2203 grids with at least 10 daily Weibo posts are selected to calculate the HERI. At the grid scale, HERI values range between 0 and 9.83 with an average value of 1.23.
The regions with a higher HERI value are mainly found in three places in our study area. The first are the areas with more rainstorm-induced damages, including serious house collapse, road blockage and mudslides. These areas are mainly found in the suburbs such as the Fangshan and Mentougou Districts. The densely populated regions, including Zhongguancun and the CBD, the Tongzhou residential area, an Internet technology parks also have a higher HERI value. Regions with   Figure 12a shows the HERI across our study area. Only the 2203 grids with at least 10 daily Weibo posts are selected to calculate the HERI. At the grid scale, HERI values range between 0 and 9.83 with an average value of 1.23.
The regions with a higher HERI value are mainly found in three places in our study area. The first are the areas with more rainstorm-induced damages, including serious house collapse, road blockage and mudslides. These areas are mainly found in the suburbs such as the Fangshan and Mentougou Districts. The densely populated regions, including Zhongguancun and the CBD, the Tongzhou residential area, an Internet technology parks also have a higher HERI value. Regions with  Figure 12a shows the HERI across our study area. Only the 2203 grids with at least 10 daily Weibo posts are selected to calculate the HERI. At the grid scale, HERI values range between 0 and 9.83 with an average value of 1.23.
The regions with a higher HERI value are mainly found in three places in our study area. The first are the areas with more rainstorm-induced damages, including serious house collapse, road blockage and mudslides. These areas are mainly found in the suburbs such as the Fangshan and Mentougou Districts. The densely populated regions, including Zhongguancun and the CBD, the Tongzhou residential area, an Internet technology parks also have a higher HERI value. Regions with important transportation hubs also have a higher HERI value. The transportation hubs include subway stations, train stations, and airports. Figure 12b shows the HERI hotspot analysis results. Hotspots are mainly located in densely populated areas, important residential and workplaces, such as the Tongzhou residential area, CBD districts, and IT parks. It is worth noting that a large number of hotspots are found in the urban core areas. By contrast, the cold spots are mainly found in the remote suburbs of our study area. Such areas have a low population thus limited human activities.
The proportions of POIs within each hotspot identified are shown in Figure 12c. The HERI hotspots in the Beijing Capital Airport, Yizhuang, and Changping-Shahe Districts have the highest percent of transportation POIs. The texts of the RRWP within these hotspots show that the rainstorms may significantly delay the commute in these regions thus stimulate users to publish more RRWP to complain the traffic. The hotspots in the Tongzhou residential area and the Mentougou District show a higher proportion of residential POIs. Hotspots in Zhongguancun, Chaoyang CBD and IT Park have a highly mixture of multiple types of residential, business, and education POIs. There is no significant difference in the proportions of the POIs in other hotspots.
Appl. Sci. 2019, 9, x FOR PEER REVIEW 11 of 16 important transportation hubs also have a higher HERI value. The transportation hubs include subway stations, train stations, and airports. Figure 12b shows the HERI hotspot analysis results. Hotspots are mainly located in densely populated areas, important residential and workplaces, such as the Tongzhou residential area, CBD districts, and IT parks. It is worth noting that a large number of hotspots are found in the urban core areas. By contrast, the cold spots are mainly found in the remote suburbs of our study area. Such areas have a low population thus limited human activities.
The proportions of POIs within each hotspot identified are shown in Figure 12c. The HERI hotspots in the Beijing Capital Airport, Yizhuang, and Changping-Shahe Districts have the highest percent of transportation POIs. The texts of the RRWP within these hotspots show that the rainstorms may significantly delay the commute in these regions thus stimulate users to publish more RRWP to complain the traffic. The hotspots in the Tongzhou residential area and the Mentougou District show a higher proportion of residential POIs. Hotspots in Zhongguancun, Chaoyang CBD and IT Park have a highly mixture of multiple types of residential, business, and education POIs. There is no significant difference in the proportions of the POIs in other hotspots.  Figure 13a shows the binary relationship between the different levels of HERI and rainfall amounts. We classify the HERI and rainfall amount into three groups (high, medium, and low) using the Jenks natural breaks, respectively. In total, there would be nine combinations between the  Figure 13a shows the binary relationship between the different levels of HERI and rainfall amounts. We classify the HERI and rainfall amount into three groups (high, medium, and low) using the Jenks natural breaks, respectively. In total, there would be nine combinations between the different levels of HERI and rainfall amounts. The HH combination (high HERI and more rainfall) is located in suburbs, such as Fangshan District and Xi'erqi, which were hit by heavy rains and afflicted with serious rainstorm-induced damages and losses. The combination (HL) with a higher HERI and less rainfall is mainly found in densely populated areas, including the urban core area, Tongzhou District, CBD and the Beijing Capital Airport. The texts of the RRWP show that people in these areas complain that the rainstorms caused significant traffic jams and ruined their daily routine. Passengers trapped in the Beijing Capital Airport also published more RRWP due to the significant flight delays. The LH combination (low HERI and high rainfall amount) is mainly in the sparsely populated regions, where few RRWP were posted due to the fewer number of the Weibo users. Figure 13b shows the POI types in the different combinations of the HERI and rainfall amount levels. The areas with a higher HERI value tend to have more transportation POIs, no matter what the rainfall amount is. By contrast, places with fewer RRWP and higher rainfall levels are less populous and with more green space.
Appl. Sci. 2019, 9, x FOR PEER REVIEW 12 of 16 different levels of HERI and rainfall amounts. The HH combination (high HERI and more rainfall) is located in suburbs, such as Fangshan District and Xi'erqi, which were hit by heavy rains and afflicted with serious rainstorm-induced damages and losses. The combination (HL) with a higher HERI and less rainfall is mainly found in densely populated areas, including the urban core area, Tongzhou District, CBD and the Beijing Capital Airport. The texts of the RRWP show that people in these areas complain that the rainstorms caused significant traffic jams and ruined their daily routine. Passengers trapped in the Beijing Capital Airport also published more RRWP due to the significant flight delays. The LH combination (low HERI and high rainfall amount) is mainly in the sparsely populated regions, where few RRWP were posted due to the fewer number of the Weibo users. Figure 13b shows the POI types in the different combinations of the HERI and rainfall amount levels. The areas with a higher HERI value tend to have more transportation POIs, no matter what the rainfall amount is. By contrast, places with fewer RRWP and higher rainfall levels are less populous and with more green space.

Factors Influencing HERI
For all grids across our study area, the AUC values on the ROC curves are 0.767 and 0.733 for the water ponding sites and the major transportation hubs, respectively ( Figure 14). The most appropriate OIDF values for the afore-mentioned two factors are 3400 m and 3200 m respectively. The bilateral Welch t test results show that the HERI value of the areas within the OIDF distance of a water ponding site and a major transportation hub were both significantly higher than those beyond and the difference is statistically significant at the 0.01 significance level. Table 2 shows the number and density of water ponding sites and major transportation hubs by each level combination of rainfall amount and RRWP. The level combinations with intense blogging activities are all associated with high density of water ponding sites and major transportation hubs, no matter what the rainfall amount is. By contrast, the level combinations with inactive blogging activities are associated with a low density of water ponding sites and major transportation hubs.

Factors Influencing HERI
For all grids across our study area, the AUC values on the ROC curves are 0.767 and 0.733 for the water ponding sites and the major transportation hubs, respectively ( Figure 14). The most appropriate OIDF values for the afore-mentioned two factors are 3400 m and 3200 m respectively. The bilateral Welch t test results show that the HERI value of the areas within the OIDF distance of a water ponding site and a major transportation hub were both significantly higher than those beyond and the difference is statistically significant at the 0.01 significance level. Table 2 shows the number and density of water ponding sites and major transportation hubs by each level combination of rainfall amount and RRWP. The level combinations with intense blogging activities are all associated with high density of water ponding sites and major transportation hubs, no matter what the rainfall amount is. By contrast, the level combinations with inactive blogging activities are associated with a low density of water ponding sites and major transportation hubs.

Conclusions
In this study, we inferred the human activities from the rainstorm-related Weibo posts and examined how different levels of human activities are associated with different rainfall amount levels at both city and grid scales. The consistency between the rainfall amount and the human activities could be explained by the distribution of the water ponding sites and major transportation hubs. The regions with high density of water ponding sites and major transportation hubs tend to show intense

Conclusions
In this study, we inferred the human activities from the rainstorm-related Weibo posts and examined how different levels of human activities are associated with different rainfall amount levels at both city and grid scales. The consistency between the rainfall amount and the human activities could be explained by the distribution of the water ponding sites and major transportation hubs. The regions with high density of water ponding sites and major transportation hubs tend to show intense human response to rainstorms in terms of the number of rainstorm-related Weibo posts. At different time periods, the intensity of human responses to rainstorm events in areas of different attributes and functions were also very different. The human response has been significantly enhanced during the early and late peak hours and is concentrated in important transportation hubs and water ponding sites. The occurrence of a rainstorm event has a huge impact on human travel. Analysis of the rainstorm-related posts suggests that there is no significant difference between the impacting ranging (~3.3 km) of a water ponding site and a major transportation hub.
We found that on the large scale, although the ground disaster space has a high consistency with the social media space, the intensity of responses at different stages, different spatial areas, and during different time periods of the disaster on social media platform were different. When looking at spatial differences on a grid scale for urban disaster events, the impacts of different types of region vary greatly due to the complexity of human-land relationships. During a rainstorm, the existence of special areas such as urban water ponding points, traffic stations, main jobs and housing areas, important line sites, and some disaster sites have led to frequent occurrence of secondary disasters and become major concentrated areas where humans respond strongly. These results show that there are time and space differences in the human response at the urban scale and grid scale under urban rainstorm events. Our research on the spatial consistency is similar to the previous research conclusions [36,39,49], but a further exploration of fine spatiotemporal process and supplementation of the factors affecting the differences in human responses give us a new understanding of the human-land relationship under the event conditions at a fine scale.
Of course, this study has some defects that can be ameliorated by additional research to improve upon our framework and further research goals. This study only examined the number of rainstorm-related Weibo posts without considering other information available in the original Weibo posts, such as the emotions, themes, and characteristics of the social media information. Other multi-source spatial data such as the nighttime lights, ambient population data, and road traffic congestion data, if successfully integrated, could provide a more comprehensive study on the human response to a natural disaster. The integration of multi-source spatial data and more comprehensive data mining methods would also significantly reduce the uncertainty of the associations between human activities in both the physical and virtual spaces.