COVID-19 Risk Assessment: Contributing to Maintaining Urban Public Health Security and Achieving Sustainable Urban Development

: As the most infectious disease in 2020, COVID-19 is an enormous shock to urban public health security and to urban sustainable development. Although the epidemic in China has been brought into control at present, the prevention and control of it is still the top priority of maintaining public health security. Therefore, the accurate assessment of epidemic risk is of great importance to the prevention and control even to overcoming of COVID-19. Using the fused data obtained from fusing multi-source big data such as POI (Point of Interest) data and Tencent-Yichuxing data, this study assesses and analyzes the epidemic risk and main factors that affect the distribution of COVID-19 on the basis of combining with logistic regression model and geodetector model. What’s more, the following main conclusions are obtained: the high-risk areas of the epidemic are mainly concentrated in the areas with relatively dense permanent population and ﬂoating population, which means that the permanent population and ﬂoating population are the main factors affecting the risk level of the epidemic. In other words, the reasonable control of population density is greatly conducive to reducing the risk level of the epidemic. Therefore, the control of regional population density remains the key to epidemic prevention and control, and home isolation is also the best means of prevention and control. The precise assessment and analysis of the epidemic conducts by this study is of great signiﬁcance to maintain urban public health security and achieve the sustainable urban development.


Introduction
Corona Virus Disease (COVID- 19), rampaging around the whole world throughout the year 2020, has not only adversely affected global public health security but also seriously threated human's health [1,2]. Although the whole country is actively coordinating to control the epidemic, the COVID-19 is still spreading. Therefore, the accurate identification of the current high-risk areas of the epidemic and the assessment of the risk level of the epidemic in different areas are both important prerequisites for the formulation of epidemic prevention policies [3]. With the approach of the winter season in the Northern Hemisphere, COVID-19 is becoming more and more active, which makes the prevention of the second outbreak of COVID-19 still an important challenge for the global epidemic treatment. Therefore, the prevention and control of COVID-19 will continue to be an important issue for maintaining urban public security and achieving sustainable urban development in the future. This is also the reason why the assessment of the current epidemic risk can provide reliable support for urban safety decision-making.
It is generally believed that the relatively effective anti-infection measures are to limit massive human migration, to classify the areas with more confirmed cases as high-risk areas, to lock down the smaller regions where the epidemic risk is relatively higher and so on. Although these anti-infection measures did play active roles in the prevention and control of COVID-19, they failed to heighten the key areas [4]. Therefore, the accurate present, logistic regression has achieved good results in urban land prediction [46], urban expansion simulation [47] and urban spatial change [48]. This study uses the advantages of logistic regression models in urban spatial prediction to assess the current risk level of  Due to the potential interdependence among the observed data of different variables distributed in the same region, the spatial factors affecting the risk level of the epidemic will have obvious spatial differentiation [49]. At present, there are few analytical methods for spatial differentiation, mainly including spatial analytical measure [49], geodetector statistics [50], MSN of stratified samples [51], Bshade of sample deviation [52], SPA model of single point sample [53] and Sandwich model of multi-unit conversion [54]. As a statistical method to detect spatial differentiation and reveal its driving force, the fundamental of geodetector is that if an independent variable has an important influence on the dependent variable, then the spatial distribution of the independent variable and the dependent variable should be similar [50]. Compared with other spatial differentiation analysis methods, geographic detectors can detect not only data, but also the interaction between different factors [55,56]. Therefore, the geodetector can not only analyze the main role of different factors, but also judge the relationship between factors, which cannot be achieved by other spatial differentiation analysis methods [57].
As the super first-tier cities, although there are no large-scale COVID-19 infections like in Wuhan; Beijing, Shanghai, Guangzhou and Shenzhen are also the regions with the greatest potential threat from COVID-19 as they are the most complex areas of population flow between cities and cities and between cities and regions in mainland China. In order to prevent and control the epidemic more efficiently with a more rational utilization of anti-epidemic resources as well as a more prominent emphases, taking Guangzhou and Wuhan as cases, this study firstly uses logistic regression model to fuse big data such as POI and Tencent-Yichuxing, and then evaluates the current epidemic risk level based on this. Then, the correlation between spatial factors affecting the epidemic risk level is analyzed by using the geo-detector model, and the accuracy is finally verified. All these provide important reference for the formulation of epidemic prevention policy.

Study Area
With a location that ranges from 112 degrees, 57 min east longitude to 114 degrees, 3 min east longitude; 22 degrees, 26 min, to 23 degrees, 56 minutes north latitude (Figure 1), Guangzhou is generally considered as one of the cities in which urbanization is most significant with a total number of permanent population 15.3059 million in 2019 [58]. The accuracy risk assessment of COVID-19 in Guangzhou can greatly contribute to predicting high-risk areas to giving prominence to emphases on epidemic provocation and control and to rationally formulating prevention and control strategies. In order to make the conclusions drawn by this study of more universal applicability, this study selects Wuhan, the city with the most cases of COVID-19 in China in 2019, as a case verification. Wuhan is located at longitude 132 degrees 41 to 115 degrees 5 and latitude 29 degrees 58 to 31 degrees 22 north ( Figure 2). As the central city of middle China and an important industrial foundation of China, Wuhan, with a total area of 8569.15 square kilometers, has a permanent population of 11.212 million people. However, from the end of 2019 to the beginning of 2020, the number of infectors for COVID-19 in Wuhan was more than half of that in mainland China. Therefore, the case verification of Wuhan, China, can further verify the correctness of the conclusions of this study [59].

Study Data
This study aims to explore the influence of different urban spatial factors on the level distribution of COVID-19 risk from the perspective of urban geographic space. Although the urban interior environment (ventilation, pollution, etc.) can also affect the risk level of COVID-19, it is limited to the topic of this study, so only urban spatial elements are considered here. At the time of the outbreak of COVID-19, urban spatial factors that affect the risk distribution level of COVID-19 mainly refer to places where people interact and gather, such as: hospitals, transportation stations, hotels, restaurants, supermarkets, markets, schools, administrative centers, cultural sites, sports stadiums, etc. [60,61]. These spatial factors are important parts of urban space [23,62]. In the relevant research about urban geographical space, these factors including hospitals, transportation stations, hotels, restaurants, supermarkets and markets all play important roles in the flow of urban space. Therefore, in the relevant research about urban space at present, these factors should be taken into consideration undoubtedly [63,64]. On the one hand, the risk of outbreaks in public service places including schools and administrative centers is relatively lower [65,66], which is mainly thanks to extremely strict population limits and epidemic prevention measures in these places, for example, to carry out remote online teaching [67], online Sustainability 2021, 13, 4208 5 of 23 office [68,69] and some other restrictive policies [70]. On the other hand, public places such as traffic stations and supermarkets are at high risk of outbreak due to the difficulty in controlling population concentration [71]. Considering the above factors, urban spatial factors that have an influence on the distribution of COVID-19 risk level including urban population (permanent population and floating population), transportation stations, hotels, restaurants and living space (supermarkets, markets, etc.) are analyzed [47]. Among them, the urban population includes all the population in the study area; the traffic stations represent all the traffic flows during the epidemic (although urban traffic can also represent the traffic flow between regions, such traffic flow only reflects the traffic behavior of people and cannot directly reflect the degree of population agglomeration. For example, there may not be many people in the area with large traffic flow, while the population in the area with dense traffic stations will also have obvious agglomeration); living space refers to the public space that the urban population needs to enter in order to live during the epidemic.
COVID-19 data: COVID-19 data including the number of designated clinics (fever clinic) appointed by government and the number of new infections increased. Firstly, the number of designated clinics: there are in total 102 fever clinics designated by municipal government of Guangzhou; as for the number of new infections, this data only goes up to April 2020, as there were no new cases since then. There are 349 confirmed cases up to April 2020, about which, 137 cases were confirmed in January, 209 cases in February, 2 cases in March and only 1 case in April. In addition, the number of fever clinics in Wuhan is 134. As of April 2020, the total number of people infected with COVID-19 is 50,333. The spatial distribution diagram of fever clinics and confirmed cases for COVID-19 of Guangzhou is shown in Figure 3 and the spatial distribution diagram of fever clinics and confirmed cases for COVID-19 of Guangzhou is shown in Figure 4. Population data: Permanent population data and floating population data are included in the population data, the data of permanent population mainly obtained from Guangdong and Wuhan Statistical Yearbook of 2019. According to the statistics, the population of permanent residents in Guangzhou reached 15.3059 million by the end of 2019, while the population of permanent residents in Wuhan reached 11.212 million, and the spatial distribution diagram of permanent population density is obtained according to the permanent population data. As for the floating population data, floating population refers to the regional change of population in a certain time within a certain range, including quantity and movement track. It can be accessed from the positioning big data service windows of Tencent (1: http://heat.qq.com/index.php, accessed on 1 October 2020). Meanwhile, Tencent-Yichuxing data reflects the degree of population congestion in the current space and time. Compared with data such as thermal map, Tencent-Yichuxing data can not only reflect the number of population, but also can reflect the movement of population Sustainability 2021, 13, 4208 6 of 23 among regions. On the basis of calculating the average floating population data collected by positioning system of Tencent big data service window from January to April 2020, the spatial distribution map of floating population density could be obtained according to the smallest accusation window (with spatial resolution of 25 × 25 m) ( Figures 5 and 6).    POI data, a total of 220,971 and 209,812 POI data of Guangzhou and Wuhan in April 2020 was obtained respectively by using Amap (2: https://www.amap.com/, accessed on 1 October 2020), and duplicate checking and data cleaning were conducted. POI data was used in this study to explore the impact of urban spatial factors on the distribution of COVID-19 and then public spatial POI data that has an impact on the epidemic in urban was screened. The screened POI data mainly include traffic station, hotel, restaurant, living space (supermarket, bazaar, etc.), of which the number of Guangzhou is 57,882; 16,134; 15,009; and 52,606, respectively, and 50,912; 12,281; 16,048; and 49,001 in Wuhan. What's more, in order to unify the observation unit, the floating population unit was resampled and the spatial distribution map of the density of traffic station, hotel, restaurant and living space in Guangzhou and Wuhan was generated (Figures 7 and 8).

Methods
It is necessary to predict the risk levels of COVID-19 and to understand the spatial variability among the factors so as to assess the risk levels of COVID-19 in the study area and to analyze the spatial relationships among urban spatial factors that influence the distribution of COVID-19 risk levels. In order to achieve these two purposes, logistic regression and geographic detector are used to predict the risk level of COVID-19 and analyze the differentiation relationship among various spatial factors.

Logistic Regression
Logistic regression is one of the methods that is most widely used in machine learning. The predicted results of logistic regression are probabilities bounded between 0 and 1, which is not only easier to use and interpret, but also more suitable for continuous and categorical independent variables than other methods [72]. Logistic regression refers to conducting logistic regression on the basis of using sigmoid function and the independent variable and dependent variable could be explored; in addition, the quantitative analysis of the probability of epidemic occurrence could be made [73]. On the one hand. Logistic regression model did have great advantages in terms of training identifying time comparing with other models such as Support Vector Machine (SVM) and Neural Network. On the other hand, it has its disadvantages as well, which is that the logistic regression model could be meaningful and useful only when the independent variable is significant. The relationship before carrying out explanation between the occurrence probability of COVID-19 and various factors could be represented as follows: .
where, P represents the occurrence probability of COVID-19, ranges [0, 1]. The closer the value of P is to 1, the higher the occurrence probability; the closer the value of P is to 0, the lower the occurrence probability. z stands for a linear combination. Therefore, the fitting equation concerned by LR could be represented as: where, C stands for the intercept of the model and it represents the error value of the occurrence probability of COVID-19 in urban space under the condition of selected indicator factors; B 1 , B 2 . . . . . . B n stand for the LR coefficient and X 1 X 2 , . . . . . . X n for the index factors.

Geodetector
It is believed that everything is related to everything else, but near things are more related to each other, according to Tobler's First Law of Geography. Therefore, providing a certain independent variable has a significant impact on the dependent variable; the spatial distribution of the independent and dependent variable could be similar to each other [74]. Geodetector is a statistical method based on the theory of spatial variance analysis proposed by Wang Jinfeng and other scholars. It can not only detect the spatial variation of level impact factor, but can also verify the spatial distribution coupling of two varieties, as well as explore the casual relationship among varieties [75].
(1) Factor Detector The spatial differentiation of COVID-19 detection and the extent to which risk factors explain the spatial differentiation of COVID-19 can be measured by q which can be expressed as follows: where, h = 1, . . . . . . L stands for the state of COVID-19 or risk factors, while N h and N stand for the number of units in layer h and the whole region, respectively. σ h 2 and σ 2 represent the variances in layer h and whole-course risk factors, respectively. SSW and SST represent the within-sum of squares and the total sum of squares, respectively. The value range of q is [0, 1], and the larger the value is, the more obvious the spatial differentiation of COVID-19. If a state is generated by risk factors, the larger the value of q is, the stronger the explanatory power of the risk factor for COVID-19, and vice versa.
A simple change in the value q satisfies the noncentral distribution F: where, C stands for the intercept of the model and it represents the error value of the occurrence probability of COVID-19 in urban space under the condition of selected indicator factors; 1 2 , B B …… n B stand for the LR coefficient and 1 2 X X , …… n X for the index factors.

Geodetector
It is believed that everything is related to everything else, but near things are more related to each other, according to Tobler's First Law of Geography. Therefore, providing a certain independent variable has a significant impact on the dependent variable; the spatial distribution of the independent and dependent variable could be similar to each other [74]. Geodetector is a statistical method based on the theory of spatial variance analysis proposed by Wang Jinfeng and other scholars. It can not only detect the spatial variation of level impact factor, but can also verify the spatial distribution coupling of two varieties, as well as explore the casual relationship among varieties [75].
(1) Factor Detector The spatial differentiation of COVID-19 detection and the extent to which risk factors explain the spatial differentiation of COVID-19 can be measured by q which can be expressed as follows: where, S S W and SST represent the within-sum of squares and the total sum of squares, respectively. The value range of q is [0, 1], and the larger the value is, the more obvious the spatial differentiation of COVID-19. If a state is generated by risk factors, the larger the value of q is, the stronger the explanatory power of the risk factor for COVID-19, and vice versa.
A simple change in the value q satisfies the noncentral distribution F : where  stands for the noncentral parameter and Y stands for the mean value of layer h . Based on Equation (5), a geographic detector can be used to detect whether q is significant.
(2) Interaction Detector To identify the interactions between different risk factors of n X is to assess the effect under the combined action of factors 1 X and 2 , ? X whether it enhances or weakens where ň stands for the noncentral parameter and Y stands for the mean value of layer h. Based on Equation (5), a geographic detector can be used to detect whether q is significant.
(2) Interaction Detector To identify the interactions between different risk factors of X n is to assess the effect under the combined action of factors X 1 and X 2 , whether it enhances or weakens the explanatory power of COVID-19 or the impact of these risk factors on COVID-19 is independent of each other. By calculating q(X 1 ) and q(X 2 ) respectively, and comparing the value of q(X 1 ∩ X 2 ) and q(X 1 ) q(X 2 ), it can be found that the relationship between the two risk factors can be divided into the following columns (Table 1).

(3) Risk detector
Whether there is a significant difference between the mean value of the attributes of the two subintervals is detected, and the statistic t is used for testing: where, Y h stands for the mean value of the attributes in subregion h, which, here, represents the incidence rate of COVID-19; n h stands for the number of samples in subregion h, and Var stands for the variance. The statistic t approximately obeys Student's distribution, and the calculation method of the degrees of freedom is as follows: Providing that Y h=1 = Y h=2 , there would be a significant difference between the mean value of the attributes of the two self-fetching parts.
(4) Ecological detector The statistic F is compared and measured to test whether the two impact factors X 1 and X 2 have significant differences in the spatial distribution of attribute Y: where, N X1 and N X2 represent the sample sizes of risk factors X 1 and X 2 , respectively; SSW X1 and SSW X2 represent the sum of the intralayer variances in the layers formed by X 1 and X 2 , respectively; and L1 and L2 represent the number of levels of risk factors for X 1 and X 2 , respectively. Assuming that SSW X1 and SSW X2 are equal, the spatial distribution effects of risk factors X 1 and X 2 are significantly different.

Graphical Representation Description Interaction
Sustainability 2020, 12, x FOR PEER REVIEW 10 of 24 the explanatory power of COVID-19 or the impact of these risk factors on COVID-19 is independent of each other. By calculating 1 ( ) q X and 2 ( ) q X respectively, and comparing the value of ( ) it can be found that the relationship between the two risk factors can be divided into the following columns (Table 1).

(3) Risk detector
Whether there is a significant difference between the mean value of the attributes of the two subintervals is detected, and the statistic t is used for testing: where, h Y stands for the mean value of the attributes in subregion h , which, here, represents the incidence rate of COVID-19; h n stands for the number of samples in subregion h , and V a r stands for the variance. The statistic t approximately obeys Student's distribution, and the calculation method of the degrees of freedom is as follows: q(X1ÇX2) < Min(q(X1), q(X2)) Weaken, nonlinear q X X ∩ and 1 ( ) q X 2 ( ) q X , it can be found that the relationship between the two risk factors can be divided into the following columns (Table 1).

(3) Risk detector
Whether there is a significant difference between the mean value of the attributes of the two subintervals is detected, and the statistic t is used for testing: Min(q(X1), q(X2)) < q(X1ÇX2) < Max(q(X1)), q(X2)) Weaken, uni- q X X ∩ and 1 ( ) q X 2 ( ) q X , it can be found that the relationship between the two risk factors can be divided into the following columns (Table 1).

(3) Risk detector
Whether there is a significant difference between the mean value of the attributes of the two subintervals is detected, and the statistic t is used for testing: q(X1ÇX2) > Max(q(X1), q(X2)) Enhance, bi- q X X ∩ and 1 ( ) q X 2 ( ) q X , it can be found that the relationship between the two risk factors can be divided into the following columns (Table 1).

(3) Risk detector
Whether there is a significant difference between the mean value of the attributes of the two subintervals is detected, and the statistic t is used for testing: q(X1ÇX2) = q(X1) + q(X2) Independent q X X ∩ and 1 ( ) q X 2 ( ) q X , it can be found that the relationship between the two risk factors can be divided into the following columns (Table 1).

Model Training
Based on COVID-19 data from January to April 2020 in Guangzhou and Wuhan, COVID-19 risk areas are divided and risk areas and risk-free areas are constructed. In this study, the factors are firstly diagnosed by collinearity, because the eight spatial factors used in this study may have multicollinearity, which will cause serious deviation to the operation results of logistic regression model. The product of TOL (tolerance) and VIF (variance inflation factor) is close to 1, which is a common indicator to reflect the degree of collinearity of factors. Generally speaking, when the product of VIF and VOL is greater than 10 or less than 0.1, it indicates a high degree of collinearity among factors, which does not meet the modeling conditions. The results of the multicollinearity analysis of 8 spatial factors are shown in Table 2, showing that all the factors VIF and VOL product are around 1. It is proven that all factors meet the conditions of collinearity analysis through multicollinearity diagnosis. Therefore, eight spatial factors are introduced into the model training.

Risk Assessment of COVID-19 of Guangzhou
It can be found from the training results of the model that the higher the risk level is, the higher the probability of COVID-19 occurrence will be. Combined with the actual geographical location, the distribution map of epidemic risk level in Guangzhou is obtained ( Figure 9). According to the distribution diagram of the risk level of COVID-19, the regions with high risk level are concentrated in Yuexiu District, Tianhe District, Liwan District and Haizhu District, which are the core areas of Guangzhou. By comparing Figures 4-8, it can be found that the areas with higher COVID-19 risk are also those with higher transportation stations, restaurants and areas with high population mobility and resident population density. Therefore, it can be concluded that COVID-19 has a great spatial correlation with traffic stations, restaurants and population distribution while the spatial correlation with the hotel and living space is relatively smaller.
Yuexiu District, Tianhe District, Liwan District and Haizhu district all belong to the area with higher population density in Guangzhou. The population floating and interaction is further strengthened with more intensive traffic stations and restaurants in these four districts. It can also be found that new confirmed cases for COVID-19 are mainly concentrated in these four regions except for the external traffic stations after the breakout of COVID-19, which proves that the density of permanent population is one of the most important factors that influence the transmission of the epidemic. In addition, what can be further found is found is that the high-risk areas, including Haizhu District, are all urban villages with dense residential areas and relatively backward related facilities.
There is also an obvious increase of confirmed imported cases in external traffic stations such as Baiyun International Airport and Guangzhou South Railway Station, in which population mobility is the main factor that results in the higher epidemic risk level. It can be found from previous studies that the main spread modes are population aggregation and population floating on a large scale. Although COVID-19 was first reported in Wuhan, the route of transmission is cut down immediately, resulting in the efficient control of epidemic after the policies of lockdown in Wuhan and restricted population mobility implemented by Chinese government. This is an excellent example that proves restricted population floating could indeed prevent and control the epidemic.
There is no doubt that the infection risk of COVID-19 will be greatly increased if population is exposed to a dangerous environment for a long time. The anti-epidemic policy "home quarantine" was decidedly implemented after the breakout of COVID-19 in China, which not only efficiently brought the population mobility all over the country under control in a short time, but also promptly controlled the population density in public areas. Although the implementation of this policy has efficiently reduced the epidemic risk, the public areas with higher population density and larger floating population are still high-risk areas at present.

Verification of COVID-19 Risk Level in Wuhan
Although the confusion matrix and ROC curve can prove the correctness of the model and results of this study, the number of COVID-19 infectors in Guangzhou is only 349, while the total population of Guangzhou is up to 15.3059 million, which shows that the results may have a certain degree of randomness. Therefore, Wuhan, the city with the most COVID-19 patients in China, was selected as the verification area in this study. As one of the cities with the largest number of COVID-19 infections in China, many studies on the risk of COVID-19 in Wuhan have been carried out. The correctness of this study can be judged by comparing the COVID-19 risk calculated by this study with that calculated by other relevant studies.
By the end of April 2020, a total of 50,333 people had been infected with COVID-19 in Wuhan. The evaluation results can directly demonstrate the correctness of this study. By collecting urban spatial element data of Wuhan and importing this model, the risk distribution of COVID-19 in Wuhan can be obtained, as shown in Figure 10.
It can be found from the distribution of the risk level of COVID-19 in Wuhan that high-risk areas for COVID-19 were mainly concentrated in Wuchang District, Jianghan District, Qiaokou District and Hongshan District, shown by Google map of Wuhan to be old urban areas with more a concentrated permanent population density. Moreover, these high-risk areas are relatively dense with floating population and other public facilities. This result is consistent with the results of Wuhan's epidemic risk assessment through other methods and research perspectives [76][77][78], which further illustrates the correctness of the results of this study.
By comparing the research results of Guangzhou and Wuhan, it can be found that there is no significant difference in the number of permanent residents between Guangzhou and Wuhan, the number of infected cases in Wuhan is far greater than that in Guangzhou, and the risk level of COVID-19 in different regions is different. Through the analysis of the data, methods and results of this study, it can be found that the accuracy of the assessment results of the risk level distribution of the epidemic in Guangzhou and Wuhan have been further verified, indicating that although the data sample size of different new crown cases will affect the risk level value of different regions (the larger the data sample size, the higher the risk level), it does not affect the distribution of the epidemic risk level in the entire region. Therefore, large cities where the epidemic is more prevalent and with more patients infected with COVID-19 can also use the method proposed in this study to assess the risk level of COVID-19.

Verification of Confusion Matrix
Precision verification is of great importance to the detection of the assessment of logistic regression model on risk level of COVID-19. Besides this, it was used to verify the distribution of epidemic risk level in Guangzhou and Wuhan, as shown in Figures 9 and 10. The verification result of confusion matrix is shown in Table 3, in that the precision verification of risky area and non-risk area is 0.892 and 0.996, respectively, with a Kappa value of 0.806 and 0.811, which proves that the logistic regression model is of great accuracy to the assessment of epidemic risk level.

Verification of ROC (Receiver Operating Characteristic) Curve
The verification of ROC curve is a means to detect and evaluate the prediction accuracy of logistic regression model comprehensively with the utilization of value of AUC (Area Under Curve); the closer the value is to 1, the higher the prediction accuracy of the model is. It can also be found in Figures 11 and 12 that the AUC value of training sample, text sample and overall data is 0.99, 0.99 and 0.99, respectively. The values are all close to 1, which not only proves that the assessment is of great accuracy, but also proves that the logistic regression model can play an accurate role in assessing the risk distribution of COVID-19.

Risk Factor Detector
It is shown in the factor detection results of Figure 7 that the density of permanent population is the most crucial factor that could decide the risk level of COVID-19 when it comes to the assessment of it, followed by population mobility, which is consistent with the assessment result of epidemic risk level. It is also proven that the most efficient antiepidemic measure is to avoid population mobility and aggregation by conducting home quarantine. The effect degree of density of traffic station, living market and restaurant on the distribution of COVID-19 is similar, with a lower influence than permanent population density and population mobility. The reason is that the density of traffic stations, restaurants and living space could affect the risk of the epidemic only in terms of human beings. As long as the population density in public areas is controlled, the risk of the epidemic will be decreased. This further shows that the rational anti-epidemic measure is to reduce population floating and interaction to prevent people from being exposed to public environment in population accumulation area. During the epidemic period, people's demands for life are the highest, followed by catering and transportation. Different demands can also reflect different risk level distribution, which is also shown in the risk factor detector table (Table 4). The factor that has the lowest effect degree to risk level of COVID-19 is the density of hotel and fever clinic; this phenomenon resulted from the following reasons: on the one hand, the decrease in the number of people going out during the epidemic directly causes the decrease in the number of people staying in hotels, which makes the spread of COVID-19 more difficult. Moreover, even if some hotels are selected as isolation hotels, the epidemic risk level would also be reduced due to the epidemic prevention measures. On the other hand, no matter where it appears, infectors can always be sent to a fever clinic for timely treatment.

2.
Interaction Detector The detection result, shown as Table 5, could be obtained after detecting the interaction of different factors. It can be found that the lowest risk level can be reached when interacting permanent population density with floating population density. Assuming that the value of q is 0.71, the effect degree of two factors after interaction is bigger than single influence factor. This also proves that the spread of epidemic can be efficiently controlled, the risk of epidemic can be effectively reduced on the premise of rational controlling permanent population and floating population.

Ecological Detector
Assuming that the text value of F is 0.05 . The ecological detector table of Table 6 can be obtained, of which, Y stands for a significant difference with N stands for a nonsignificant difference. In terms of the risk distribution of COVID-19, there is a significant difference between permanent population and other spatial factors, which proves that the density of permanent population is indeed the most important factor affecting epidemic risk, followed by floating population. Compared with population factors, other factors, including supermarkets, hotels and living markets, show no significant difference in COVID-19 risk distribution. This suggests that other public places will not directly result in the increase of epidemic risk level on the premise of rational controlling the density of regional permanent population. In other words, population is a direct contributor leading to the increase in the risk of COVID-19 in public places in urban spaces.

Population Flow
Population Density

Restaurant Density
Hotal Density

COVID-19 Fever Hospital Density
Y stands for a significant difference while N stands for a non-significant difference.
It can be found by analyzing the factors affecting the risk level of COVID-19 by the geographic detector that population is the direct factor affecting the level of epidemic risk. Under the premise of reasonable population control, public places will not directly cause an increase in the level of epidemic risk. Therefore, it is not necessary to completely implement lockdown policies and restrict the use of public spaces in large cities. As long as population density restrictions are implemented in public spaces and places with public uses, the risk level of the epidemic can be effectively controlled without affecting the operation of the city. Compared with the strict epidemic prevention measures adopted in public areas such as schools and administrative centers, restricting population density in transportation stations, restaurants, and living spaces is undoubtedly a more convenient and effective choice. "Home quarantine" is the best epidemic prevention measure at present, because it can greatly limit the movement and interaction of the population, which is conducive to the decrease of population density in public space. The decline in population density in public space will undoubtedly directly reduce the risk of the epidemic. In addition, there is no significant difference in the results of Risk Factor Detector, Interaction Detector and Ecological Detector between Guangzhou and Wuhan, indicating that within the urban space, the influencing factors of the Risk level of the new crown epidemic are the same, which also makes the results of this study valuable for promotion.

Discussion
Using the fused data obtained from fusing spatial geographical big data such as POI data and Tencent-Yichuxing data, this study assesses the epidemic risk level of Guangzhou and conducts spatial difference analysis on different urban spatial factors that affecting the distribution of COVID-19 on the basis of combining with logistic regression model and geodetector model. What's more, the following main factors that affecting epidemic level are obtained: logical regression model calculates the performance of different factors affecting the epidemic in urban space, and then simulates the final result. In fact, logistic regression model is a process of constantly seeking the optimal solution of the results. Compared with other machine learning models, the calculation process is simpler and the result expression is more direct. As for the geodetector, it can detect the heterogeneity of spatial distribution pattern between dependent variables and independent variables through the spatial difference between different variables, and then measure the degree of mutual explanation between different variables. Therefore, it can be found that compared with other statistical methods, the geodetector can better reflect the causal relationship between different variables.
Since the breakout of COVID-19, relevant research about the epidemic are mainly carried out from the perspective of population mobility [79], regardless whether in the range of region or country even of the global world. Population mobility has indeed played an important role in the risk assessment of epidemic and it has also been proven that it is one of the most crucial factors that could enormously influence the epidemic risk [7]. However, population mobility is not the only factor that results in a higher risk for COVID-19, and that is why this study objectively assesses the risk level of epidemic on the basis of comprehensively considering population, public places and other open special factors. Using the fused data obtained from fusing spatial geographical big data such as POI data and Tencent-Yichuxing data, this study also explores primary and secondary factors that affect epidemic risk and the interrelationship among these factors except for assessing the risk level, and it is also shown in the world. In addition, the verification result also shows that the risk assessment of epidemic conducted by this study is of great accuracy.
The high-risk areas for epidemic are mainly concentrated in the areas with more intensive permanent population and more frequent interaction among people [7,19], which has also been proven to be correct by previous research. However, comparing with previous research, this study not only takes various urban spatial factors into account to comprehensively discussing the influencing degree of different factors, but also analyzes the distribution of epidemic risk level comprehensively and objectively, which could be instrumental to the target prevention and control of regional epidemic.
Compared with the existing studies on the distribution of epidemic risk [15,17], the main contribution of this study is reflected in the research methods and ideas. From the perspective of research methods, this study uses geographic detectors to analyze the urban spatial factors that affect the risk of the epidemic, derives the primary and secondary factors that affect the risk of the epidemic, and analyzes the correlation between different factors. Secondly, from the geographical perspective, compared with other epidemic research, this study explores the distribution of the risk level of the epidemic in urban space, which highlights the role and influence of urban space in the spread of the epidemic. The results obtained from the study do not only contribute to the formulation of urban epidemic prevention policies, but also play a positive role in guiding the risk prevention and control of COVID-19.
In the prevention, control and management of the epidemic, no matter individuals, families or countries have made great efforts and sacrifices to defeat it. From the policy point of view, restricting population mobility undoubtedly has an active impact on the communication of population in different ways, although home quarantine restricts the communication among individuals, families and cities to a certain extent, mainly reflecting in the population restrictions. By comparing Tables 4-6, it can be found that population density (permanent population and floating population) is the main factor affecting the risk of the epidemic. Therefore, the way of reducing population density can effectively reduce the risk level of the epidemic and make outstanding contribution to the management of the epidemic, which is also demonstrated in this study. If there is no restriction among individuals, families and cities, the risk of epidemic will increase sharply. Therefore, this study also has theoretical guidance value for epidemic prevention and control.
There is no doubt that there are still many deficiencies and improvements to be made in this study. Although COVID-19 has been brought into efficient control and people's life has gradually returned to normal, it is still necessary to conduct assessment and analysis more meticulously. To prevent the secondary breakout of COVID-19 more efficiently, especially in winter when the virus will be more active, it is essential to carry out simulation analysis of the global pandemic.

Conclusions
The assessment of distribution of COVID-19 risk level is of great practical significance to both the protection of public health security and the sustainable development of urban areas. Using spatial geographical big data such as POI data and Tencent-Yichuxing data, this study assessed the epidemic risk level of COVID-19 and analyzed the main factors that affect the distribution of COVID-19 in Guangzhou as well as the interrelationship among these factors on the basis of combining with logistic regression model and geodetectorgeodictator model. What's more, the confusion matrix, ROC curve and case study were used to conduct the case verification. Finally, the following conclusions have been obtained: (1) The high-risk areas are mainly concentrated in the areas with higher density of permanent population such as Haizhu district and Yuexiu district. On the one hand, although COVID-19 has been brought under control in mainland China, regions with high population density and frequent population mobility and interaction are still at high risk of COVID-19. On the other hand, although the policy of home quarantine cuts off the transmission routes among individuals, families and cities, there is still a high probability of COVID-19 infection in densely populated areas. Therefore, the focus of epidemic prevention should be to control the regional population density rationally. (2) The most influential factor affecting the risk level of epidemic is permanent population, followed by floating population. The interaction between the permanent population and the floating population explains the distribution of the epidemic risk level to the highest degree, indicating that reasonable control of population ag-glomeration and interaction has a significant effect on the prevention and control of the epidemic. During the epidemic period, the demand of urban population for transportation, catering and life determines the impact of transportation stations, restaurants and living space on the epidemic risk. The density of fever clinics and hotels has a relatively lower impact on the risk of the epidemic, and timely treatment and quarantine could reduce the risk level of the epidemic.
From the perspective of geography, this study explores the different spatial factors that affect the risk level of the epidemic and the correlation between the factors, which plays a certain role in the prevention and control of the epidemic. The conduction of this study firstly can be instrumental in the accurate assessment of areas with higher epidemic risk. Secondly, it can be beneficial to the rational classification of the regional priorities in epidemic prevention. Thirdly, it can be favorable to the prevention and control of COVID-19. Last but not least, it can contribute to providing a responsible approach for maintaining urban public health security and achieving sustainable development of the city.