1. Introduction
With worldwide urbanization, the proportion of urban population is projected to reach 66% by 2050 [
1]. The remarkably growing population is making cities even more crowded and increases the crowd-gathering risks [
2]. According to incomplete global statistics, thousands of people die in the emergencies occurring in crowd-gathering activities per year [
3]. In open public spaces, which are open-boundary places, crowd accessing and leaving are unregulated, making them more likely to have large and risky crowd-gatherings compared to other places. Due to the high density of the crowd, once a crowd accident occurs, it will cause severe casualties [
4,
5,
6]. For example, in the Love Parade accident of 2010 in Duisburg, Germany, 21 people died and more than 500 people were injured [
7]; in the stampede that occurred in the Chenyi Square of Shanghai, 36 people died and 49 people were injured [
8]. The rising number of these accidents has begun to arouse people’s attention to crowd safety in open public spaces. Study on the emergency management of public security in open public spaces with high crowds has gradually become a focus area in academia [
9].
Urban open public spaces are the open spaces in cities that are open to all urban residents and without specific boundaries [
10]. Open public spaces are usually located near the traffic facilities or commercial facilities in the center of a city, where crowd-gathering situations easily occur. Moreover, because of the open-boundary characteristic, the crowd flows in open public spaces have high uncertainty and crowd congestions easily occur at bottleneck places, which rises the risk of stampede accidents.
Existing studies on crowd safety in open public spaces mainly focus on the observation and measurement of crowd traffic, computer simulation of crowd movements, and performance-based design of buildings [
11]. For instance, Helbing et al. [
12] proposed the social force model (SFM) to simulate and analyze the pedestrian movement characteristics in both normal and evacuation situations. Liu and Kaneda [
13] used an agent-based simulation model to test the efficiency of different spatial operations or management plans and improve crowd safety in high-density environments. Moreover, some scholars have conducted studies on risk assessment of crowd-gathering. Hu et al. [
14] utilized two methods, node access frequency method (NAFM) and combined critical cluster model (CCCM), to identify key nodes with high crowd-gathering risk in an open public space in Beijing. Still et al. [
15] proposed a crowd safety assessment model to estimate the crowd safety in the public spaces of assembly. According to the classical risk theory, Liu et al. [
16] selected four evaluation factors to build a quantitative crowd-gathering risk assessment model for urban public spaces. Zhang et al. [
17] proposed a method to evaluate crowd safety based on fuzzy inference that considered the number of pedestrians and the distribution uniformity as two kinds of attributes of a crowd. The crowd-gathering risk in open public spaces is different from that of other public spaces that have specific boundaries and fences. The risk is closely related to the internal and external environments of the open public spaces. For example, the serious casualties caused by the stampede in Shanghai in 2014 were not only because of the high crowd density in Chenyi Square. The high population density on the nearby roads and the high crowd density in the adjacent open public spaces, such as the Nanjing Road Pedestrian Street, made it very difficult to evacuate the internal crowd. This was also an important contributing factor to the stampede accident. However, previous studies mostly focused on a specific public space without simultaneously considering internal and external risk factors of crowd-gathering, and rarely focused on the crowd-gathering risk in a large region. Understanding the dynamic spatial distribution of crowd-gathering risk on a city scale is of great significance to improve urban planning and reduce the risk of crowd-gathering.
The prediction and early warning based on the status of crowd-gathering through measurable parameters of the crowd is the main technical method to reduce the crowd-gathering risk [
11]. The crowd density, i.e., the number of people per unit area, has been identified as one of the most important factors to assess the risk of crowd-gathering in open public spaces [
18,
19,
20]. However, the traditional crowd density estimation approaches have their limitations when applied to open public spaces. For instance, the field survey approach is time-consuming and laborious, and it is difficult to use for open public spaces in a large region. The sensor detection approach is another traditional crowd density estimation method. This approach requires the allocation of sensors to count the entering personnel, which is more suitable for public spaces with access control. However, in open public spaces, people can enter and leave at will. Therefore, it is difficult to use this approach to count the number of people at a specific time and estimate the crowd density. The third example of a traditional method is the video-based approach. It also has limitations in that the low-light conditions make it difficult to obtain an accurate estimation of the crowd density and the whole public space is unlikely to be covered [
21,
22]. In recent years, with the development of wireless network technology, mobile devices, such as smartphones, have been popularized. Spatio-temporal big data based on mobile devices have been increasingly considered for monitoring the crowd density [
21,
23]. Tencent user density (TUD) data are big data generated by the location of smartphone users who use Tencent applications [
24]. TUD data have been widely utilized in research considering urban population distribution mapping [
25,
26,
27], identification of urban functional areas [
28], park use measurement [
29,
30], estimation of potential victims of burglary [
31], measuring polycentric urban structures [
32], urban land use classification [
33], and in other studies [
24,
34,
35]. TUD data can cover a wide range area with high temporal and spatial resolution. Hence, TUD data have the potential to estimate the crowd densities in urban open public spaces.
Considering the limitations of existing studies, a new method to assess the rank and spatial distribution of crowd-gathering risk in open public spaces is proposed in this study. This method first proposes a new approach to estimate the crowd density based on spatio-temporal big data in urban open public spaces. Then, the crowd density estimation approach is integrated in the spatio-temporal crowd-gathering risk assessment method as an important factor. The crowd-gathering risk assessment method integratively considers the occurrence probability of crowd-gathering events, the severity, and the risk aversion factor, which is on the basis of the classical risk theory. Thematic maps describing the ranks and spatial distribution of crowd-gathering risk can be generated. A case study of the area within the Outer-ring Road of Shanghai was conducted to determine the feasibility of the new method. The structure of this paper is organized as follows: the proposed methodology is introduced in
Section 2. In
Section 3, the case study is implemented to verify the applicability of the method. The results of risk assessment of crowd-gathering and discussion are presented in
Section 4. The conclusions and an outlook to the future are given in
Section 5.
2. Methodology
To develop the methodology of this study, a TUD-data-based crowd density estimation method for open public spaces was first built. Because it is difficult to assess the dynamic crowd density merely by the TUD data, a TUD–population calibration method was built to describe the relation between the TUD data and the population data in residential land parcels, and calibrate the TUD data by a linear regression function. Utilizing the method, the dynamic crowd density in the urban open public spaces can be effectively estimated. After the crowd density estimation, a reasonable crowd density threshold can be delimited to detect critical crowd situations in open public spaces and determine the key open public spaces that need to have intensive crowd-gathering prevention. Secondly, the quantified risk assessment method of crowd-gathering for urban open public spaces was built on the basis of classical risk theory. Thematic maps that describe rank and spatial distribution of crowd-gathering risks can be generated. The risk levels and spatial distribution characteristics provided in the maps can help the government and relevant departments determine the crowd control measures in different open public spaces and regions to reduce the crowd-gathering risk.
Figure 1 shows the workflow of the method. The workflow mainly consists of two major parts (
Figure 1): (1) crowd density estimation for open public spaces; (2) risk assessment of crowd-gathering in open public spaces.
2.1. Crowd Density Estimation for Open Public Spaces
2.1.1. TUD–Population Calibration Method
The Tencent user density (TUD) data, provided by Tencent Company in China (
https://www.qq.com, accessed on 31 August 2019), is a kind of spatio-temporal big data with fine spatial and temporal resolution. The data can reflect the density of smart phone users using Tencent applications. According to the recent reports from Tencent, which is one of the largest Internet companies in the world, the proportion of Tencent users in the total population in China is over 70% and this proportion is more than 90% in Chinese first-tier cities. In the Tencent data, the geographic space is divided into many square grids. The locations of users can be recorded, and the number of users in each grid can be converted into a density weight, which can provide a dynamic user density distribution in the urban area. The TUD data have been proven to have high correlation with urban population distribution.
However, due to the encryption of the TUD data, the TUD values do not represent the real user numbers. Meanwhile, although the TUD data have high correlation with the population distribution, the TUD value of a grid cannot be directly utilized as the population or crowd number of that grid. According to a field investigation from another paper [
29], about 20–30% of the park users are not included in the TUD data, such as children and others without mobile devices and those who do not use Tencent Apps. Hence, the TUD data do not include everyone’s location information. Hence, a TUD–population calibration method is indispensable to estimate the crowd density of the open public spaces. The calibration method can provide the description of the relation between the TUD data and the real spatial population distribution. Two assumptions were made as follows:
(1) At nighttime (21:00–22:00 in this study), the urban residents are all distributed on residential land parcels: It is of high probability that people are in residential buildings during this period at nighttime. The floating population on residential land parcels is not considered in this study because of its random distribution and relatively smaller population size compared to the permanent population.
(2) For the estimated period in daytime and nighttime in this study, the relation between the TUD data and the real spatial population distribution is stable.
Based on these assumptions, the TUD–population calibration method, which builds a linear regression function between the census population data and nighttime TUD data, is represented by Equation (1) as follows:
where the dependent variable
is the population of the
i-th subdistrict; independent variable
is the total TUD values count on the residential land parcels of the
i-th subdistrict;
is the constant value;
is the coefficient for the total TUD values count of the
i-th subdistrict. The reason why the nighttime TUD data are used here is that the urban residents are mostly in residential areas in the nighttime. In other words, the nighttime TUD data have the highest correlation with the urban residential population.
The regression analysis can be executed to estimate the mathematic relation between the TUD data and the urban population. After testing and verifying the method, Equation (2) can be used to distribute the population in each time. When calculating the population in each grid, the estimated grid population will be biased if the constant value
of the given subdistrict is averagely assigned to each grid [
27]. To avoid this problem, the constant value
of the grids within the given subdistrict can be distributed according to the numerical value of the TUD grid, i.e., the grids with larger TUD values contribute more to the constant, while the grids with smaller TUD values contribute less. The mathematical equation is as follows:
where
is the population of the
j-th grid in the
i-th subdistrict;
is the TUD value of the
j-th grid in the
i-th subdistrict.
Figure 2 describes the application process of the TUD–population calibration method. First, an overlay analysis on three kinds of spatial data (nighttime TUD data, residential land parcels, and subdistrict boundary) is conducted to obtain the total TUD values on residential land parcels of each subdistrict. Because the population of each subdistrict can be acquired from census data, a linear regression function can be built to describe the mathematic relation between the population census data and the nighttime TUD data. Utilizing the function, the TUD data in the estimated time periods can be calibrated by the method to assess the crowd number data for each grid in the study area.
2.1.2. Crowd Density Estimation
Based on the TUD–population calibration method, the real population in each grid can be estimated, and then the total number of people in each open public space can be obtained through spatial overlay analysis between the population assessment result and the spatial data of the open public spaces. The crowd density in the
-th open public space at time
can be obtained by Equation (3) as follows:
where
is the estimated crowd density of the
-th open public space at time
;
is the population of the
-th grid in the
-th open public space;
is the area of the
-th open public space.
The correlation coefficient (CC) and mean absolute error (MAE), which are well-known accuracy assessment indexes of population mapping, were selected to comprehensively assess the accuracy of the crowd density estimation result. The CC and MAE are calculated by comparing the estimated crowd density and the crowd density samples generated by the official released real-time passenger flow data or by field surveys. The equations for these indexes are as follows:
where
is the actual crowd density of the
i-th sample,
is the mean value of the actual crowd density of all samples,
is the estimated crowd density of the
i-th sample,
is the mean value of the estimated crowd densities of all samples, and
is the number of samples for accuracy assessment.
2.2. Risk Assessment of Crowd-Gathering in Open Public Space
As mentioned in the introduction section, the risk assessments of crowd-gathering in previous studies mostly focused on a single public space from the microcosmic perspective. To our knowledge, no previous studies measured the spatial distribution of crowd-gathering risk microcosmically and spatio-temporally in large regions because of the lack of effective models and data. However, for the government and relevant departments, it will be of great help if the key urban open public spaces that need to have intensive crowd-gathering prevention can be identified and the risk rank of crowd-gathering can be provided for public safety management. Therefore, we developed a new method to assess the risk of crowd-gathering in a large area supported by multisource geospatial big data. This method takes into account various characteristics of public spaces and crowds. Firstly, the key open public spaces that need to have intensive crowd-gathering prevention can be determined based on the crowd density threshold. Then, the crowd-gathering risks of the key open public spaces of different times can be quantitatively assessed by the crowd-gathering risk assessment method. A risk map which describes the crowd-gathering risk rank of various open public spaces can be produced.
2.2.1. Identification of Key Open Public Spaces
When an emergency occurs, individuals in a crowded open public space may have psychological panic and abnormal behaviors such as screaming, shoving, and falling, which may cause the whole crowd to fall into chaos [
36]. Therefore, it is necessary to identify these open public spaces with high crowd densities. According to the literature on the crowd density studies of open public spaces, when the crowd density exceeds 40–300 people/ha, the personnel in open public space no longer feel safe and comfortable. When the crowd density exceeds 1000–5000 people/ha, the crowd will be in a crowded state [
37]. In this paper, the crowd density threshold of 1000 and 300 people/ha was taken as for identifying key open public spaces and secondary key open public spaces, respectively. The specific identification method is shown in Equations (6) and (7) as follows.
where
is the average population density of the open public space
at time
t;
is the maximum crowd density of the open public space
in the whole period of time;
is a function to judge whether
is a key public space or a secondary key open public space.
2.2.2. Crowd-Gathering Risk Assessment
In previous studies, the crowd-gathering risk was usually assessed based on factors such as place characteristics, management measures, and crowd characteristics [
16], since these factors directly relate to the probability and consequences of the accidents. However, most of these studies ignored the impact of external spaces on crowds-gathering risk. The importance of considering the factors of external environments has been explained in the Introduction section.
In general, risk is usually described as a combination of the consequences derived from the range of possible accidents and the likelihood of these accidents [
38,
39]. In the field of crowd-gathering risk, scholars usually describe the crowd-gathering risk by the possibility of the crowd accident occurring and the consequences of accidents caused by crowd [
11,
16,
40]. For example, Liu et al. expressed the crowd-gathering risk in urban public spaces by the quantified equation “
”, where
is the occurring possibility and
is the consequences of crowd accidents. Hence, based on risk theory and previous studies, the crowd-gathering risk of open public spaces in this study is expressed as follows [
16]:
where
is the crowd-gathering risk of an open public space;
represents the probability of an accident occurring in an open public space under the crowd-gathering condition;
represents the severity of the accident consequence of an open public space;
is the risk aversion factor.
Probability of the Crowd Accident Occurrence
The probability of an accident occurring in an open public space can be represented by the cumulative weighted values of four factors: the crowd density, gathering tendency, place dangerousness, and the traffic situation of the external environments. The weight of each factor is determined by the analytic hierarchy process (AHP) method. The specific equation is as follows:
where
is the probability of the crowd accident occurring;
,
,
and
represent the crowd density, gathering tendency, place dangerousness, and traffic situation, respectively;
,
,
, and
are the weights of the four factors. For the integration of the various factors, normalization is used to distribute values of these factors between 0 and 1.
The gathering tendency of the crowd is built based on the change of crowd density in a public space at different times, which can be expressed by a function of the maximum crowd density and the change rate of crowd density in this period, shown in Equation (10) as follows:
where
is the time period from
to
;
is the gathering tendency of the time period
T;
is the crowd density at the time
.
The place dangerousness is also an important factor in the risk assessment of crowd-gathering. Inspired by the studies of Liu et al. [
16] and Ye et al. [
41], the place dangerousness factor integratively considers the criteria of the following four categories (
Table 1): (1) number of obstacles, (2) affected degree of steps, (3) affected degree of slopes, and (4) affected degree of ground pavements/materials. The value of the place dangerousness is calculated by the cumulative scores of the four criteria in
Table 1. The above criteria of the four categories are collected and assessed from high-resolution remote sensing images, photos of open public spaces obtained from social media platforms and field surveys. Each criterion is assigned a value from the score 0–5. The value assignments are determined according to the specific situations of the open public spaces in the study area.
- (3)
Traffic Situation of the External Environment
Urban open public spaces are quite different from other urban places, such as residential communities or schools. Most urban open public spaces are closely related to the external environment, such as the streets and roads in the surrounding areas. When an emergency occurs, the traffic situation of the external roads affects the situation of the internal crowd-gathering of the open public space. Congested traffic makes rescuing more difficult as the rescue vehicles cannot enter the crowd accident area. Therefore, in this study, the average traffic congestion degree of the surrounding roads of an open public space is taken as the traffic situation factor of the open public space. The traffic situation of the external environment for an open public space is calculated as follows:
where
is the total number of adjacent roads of an open public space;
is the traffic situation of the
-th adjacent road. The traffic situation of each road is divided into four levels: 0—unknown, 1—expedite, 2—congested, and 3—blocked.
Severity Index
The severity of accidents in open public spaces is related to the crowd-gathering size exposed to the accident and the crowd structure (such as gender and age). The greater the crowd-gathering size exposed to the accident, the higher the proportion of vulnerable groups, and the accident casualties are likely to be more serious. However, it is difficult to obtain the data of crowd structure and the specific location of the accidents in open public spaces. Therefore, relevant studies have mostly used crowd-gathering size to indicate the severity of accidents [
42]. In this study, the high crowd mobility inside and outside the open public space is also considered. People outside the public space may also be affected by the crowd-gathering accident. Therefore, the weighted sum of the number of people inside the public space and the number of people outside the public space is used to express the severity of the accident consequence as follows:
where
and
represent the number of people inside and outside the open public space, respectively;
and
are the weight coefficients of
and
, respectively.
Risk Aversion Factor
The risk aversion factor is also an important factor in the risk assessment. Risk assessment should consider not only the extent of the direct damage (such as the number of fatalities or monetary costs), but also the indirect impact of the accident (such as the social and economic impact) [
43]. Both direct and indirect effects should be included in the quantitative risk assessment, which can be quantitatively represented by the risk aversion factor in this study. The risk aversion factor depends on two factors as follows:
In this equation, represents the influence of the reputation of an open public space on the risk aversion factor. Crowd-gathering accidents occurring in a public space that has a high reputation or is near well-known places (such as a large conference site) will cause greater social impact compared to those occurring in the public spaces that are not well known. These accidents will have higher values of . represents the influence of public space type on the risk aversion factor. and are the weight coefficients of two factors.
The value evaluation standard of
for different open public spaces is shown in
Table 2. The social media user comment data are used to evaluate the reputation of open public spaces. The spatial distance analysis is used to evaluate the proximity of open public spaces to well-known places.
The influence of public space type on the risk aversion factor (
) is evaluated according to the standard in
Table 3.
The above equations, including Equations (8) to (13), are integrated to calculate the value of crowd-gathering risk in open public spaces quantitatively. After ranking the risk of different open public spaces, a risk map can be generated to describe the rank and spatial distribution of crowd-gathering risk. Based on the information provided in the map, the government and relative administrative departments can discriminate the key open public spaces that need to have intensive crowd-gathering prevention and determine the crowd control measures to reduce the crowd-gathering risk.
5. Conclusions
This study proposes a new approach supported by multisource geospatial big data to measure the crowd-gathering risk in urban open public spaces. Firstly, a new crowd density estimation method based on Tencent user density (TUD) data is proposed. In this method, TUD data can be calibrated by the residential population using a linear regression function. Then, the dynamic crowd density in the urban open public spaces can be accurately estimated. Secondly, a spatio-temporal crowd-gathering risk measurement method for urban public spaces is presented. Key open public spaces that need intensive crowd-gathering prevention can be determined based on reasonable crowd density thresholds. For these key open public spaces, the quantified crowd-gathering risk assessment can be conducted on the basis of risk theory, which considers the occurrence probability of crowd-gathering events, severity, and the risk aversion factor. Detailed risk ranks can be spatially described on thematic maps. The advantages of this study lie into two aspects: (1) In this study, fine-scale crowd density in urban open public spaces in large areas can be estimated by calibrating the spatio-temporal big data (TUD data), which overcomes the shortages of traditional estimation methods. (2) The proposed risk assessment method of crowd-gathering in urban open public spaces simultaneously considers the internal and external environments of the open public spaces. Different external risk factors, such as the traffic situation and the number of people outside the open public spaces, are taken into consideration, which makes the crowd-gathering risk assessment more rational and comprehensive.
A case study in the area within the Outer-ring Road of Shanghai, China was conducted to demonstrate the feasibility of the proposed method. By utilizing the proposed TUD–population calibration method, the crowd densities at different times were calculated for each open public space in the study area. The crowd density in various open public spaces presented significant differences on the weekdays and weekends. From the center area to the urban fringe, the crowd densities of open public spaces showed a tendency to decrease. Key and secondary key open public spaces that need intensive crowd-gathering prevention were identified by the crowd density thresholds. Utilizing the proposed crowd-gathering risk assessment method, the spatial and temporal distribution of crowd gathering risk in urban open public spaces in the study area was described on thematic maps. The kernel density analysis showed a similar polycentric spatial distribution of crowd-gathering risk for weekdays and weekends. According to the risk rank and spatial distribution provided in the map, governments can determine the crowd control measures to be taken in different open public spaces, make reasonable arrangements of field staffs, and optimize the sign design to reduce the crowd-gathering risk and prevent the occurrence of dangerous events.
Limited by the availability of data, the proposed method in this paper has some inadequacies. (1) Considering the spatial resolution of the TUD data (the size of a pixel in the study area is 27 m × 27 m), the people in the nearby parcels, such as office buildings, may be counted in the crowd density estimation of open public spaces, which reduces the accuracy of the estimation. (2) This paper adopts an artificial method, instead of simulation methods, to score the place dangerousness factors of open public spaces, which may not fully represent the impact of obstacles on the crowd-gathering risk. (3) Moreover, this study uses the officially issued real-time passenger flow data to verify the precision of the crowd density estimation; however, only a small number of scenic areas have such officially issued data. In the future, the crowd densities obtained from video footage recorded by video cameras or field surveys in open public spaces can be used to verify the crowd density estimation.