Methods for Infectious Disease Risk Assessments in Megacities Using the Urban Resilience Theory

: Since the 20th century began, the world has witnessed the emergence of contagious diseases such as Severe Acute Respiratory Syndrome (SARS), H1N1 inﬂuenza, and the recent COVID-19 pandemic. Conducting timely infectious disease risk assessments is of signiﬁcant importance for preventing the spread of viruses, safeguarding public health, and achieving sustainable development. Most current studies on epidemic risk assessments focus on administrative divisions, making it challenging to reﬂect the risk disparities within these areas. Taking Shanghai as an example, this research introduces the concept of urban resilience frameworks and identiﬁes the risk factors. By analyzing the interactions among different risk factors using geographic detectors, this study establishes the distribution relationship between the risk factors and newly reported cases using Geographically Weighted Regression. A risk assessment model is constructed to evaluate the infection risk within different regions of the administrative area. The results demonstrate that the central area of Shanghai exhibits the highest infection risk, gradually decreasing toward the periphery. The Spearman’s correlation coefﬁcient ( p ) between the predicted and actual distribution of new cases reaches 0.869 ( p < 0.001), and the coefﬁcient of determination (R2) is 0.938 ( p < 0.001), indicating a relatively accurate assessment of infection risk in different spatial areas. This research methodology can be effectively applied to infectious disease risk assessments during public health emergencies, thereby assisting in the formulation of epidemic prevention policies.


Introduction
Since the 20th century began, infectious diseases such as Severe Acute Respiratory Syndrome (SARS), H1N1 influenza, and novel coronaviruses have rapidly spread worldwide.These diseases have significantly impacted economic and social development, persisting on a global scale.Particularly in megacities, which represent densely populated regions integrating geographical, political, economic, social, and cultural functions, there exists a complex interplay of abundant resources and immense pandemic pressures and risks.Ensuring the safety, health, and stability of these megacities is of paramount importance.Confronted with infectious diseases, timely and comprehensive risk assessments play a crucial role in preventing virus transmission, safeguarding public health, ensuring the security of megacities, and realizing sustainable development.The dynamic interconnections between various factors necessitate a meticulous evaluation to effectively combat the challenges posed by these diseases [1].
Infectious disease risk assessment refers to the use of existing information by health institutions to assess the level of threat posed by an epidemic and provide risk warnings.However, most current studies on epidemic risk assessments are based on administrative regions [2], which makes it difficult to reflect the differences in infection risks within these regions [3].Therefore, conducting fine-grained infectious disease risk assessment studies is essential for the precise management of epidemics within administrative regions, safeguarding public health, and achieving sustainable development.
Researchers have proposed various models for infectious disease risk assessment, such as the Susceptible-Exposed-Infectious-Recovered (SEIR) model, which uses the number of cases and population contact to construct differential equations [4].The Pressure-State-Response (PSR) model combines multiple risk factors to assess the epidemic risk [5].The Long Short-Term Memory (LSTM) model has been utilized to assess risks by exploring time-series information on disease infections [6,7].However, these models often evaluate risks at the administrative level and pay less attention to the spatial distribution of risks within administrative regions [8,9].Different areas within administrative regions often exhibit varying risks [10], such as the risk differences between densely populated and sparsely populated areas in terms of infection distribution [11].
The concept of "urban resilience" has opened up new avenues for epidemic risk assessments.Urban resilience refers to the ability of a city or urban system to absorb and withstand external shocks, maintaining its key features and functions without significant impact.When dealing with infectious diseases, different risks are often observed within urban areas due to varying external impacts and resistance capabilities.Using the "urban resilience" theory to construct models for calculating epidemic risks helps clarify the mechanisms underlying epidemic risks, enabling the scientific calculation of the impact and resistance of a city when facing infectious diseases, which in turn determines the accuracy of the model.
Previous studies have indicated that the impact force on a city during an epidemic is mainly determined by the number of newly infected individuals, while resistance is primarily influenced by the population, transportation, and aggregation in proximity to the patients [12,13].The fine-grained representation of the spatial distribution of new infections and the density of surrounding populations are crucial for utilizing the "urban resilience" theory to assess epidemic risks in a granular manner [14,15].With the advent of geospatial big data, these data can effectively represent the population, transportation, and aggregation in various areas within a city, making them widely used in spatial health analysis and research [16].
Researchers such as MF [17] have used geospatial big data to construct epidemic tree models to determine the basic reproduction numbers of different spatial epidemics.Xia Jizhe et al. [18] used geospatial big data to correct the transmission parameters of population dynamics models.Yao Xiao et al. [10] employed geospatial big data and random forest models to classify the risk of epidemic transmission in different areas within administrative regions, yielding favorable results.
Furthermore, in recent years, there has been an increasing amount of research utilizing the addresses of new patients and geocoding techniques for fine-grained spatial localization of epidemic patients [19][20][21].For instance, Hu Tao et al. used geocoding techniques to map the distribution of liver diseases at a fine-grained city level [22], and Peng Ming Jun employed weighted geocoding techniques to map the community distribution of COVID-19 patients within a city [23].
According to the laws of geography, resilience indicators of the same size often have different effects in different spatial contexts.Geographically Weighted Regression (GWR) models have achieved good results in modeling with spatially varying effects.The GWR model explores the spatial variations and related influencing factors of diseases at a certain scale by establishing local regression equations at each point within the spatial extent and can be used to assess the future development of diseases.Due to its consideration of the local influences and effects of spatial objects, it exhibits higher accuracy.
Therefore, to address the problem of the difficulty in reflecting the differences in risk within administrative regions, this study introduces the concept of "urban resilience".Using Shanghai as an example and utilizing geocoding techniques to pinpoint the finegrained distribution data of patients, this study characterizes the impact force indicators faced by the city during an epidemic.Furthermore, this study combines grid-level data on diagnosed patients (GLD) obtained using geocoding techniques with geospatial big data such as population density (PD), points of interest (POI), and road network (RD) data to comprehensively construct risk factors (RFS).
This study establishes an epidemic infection risk assessment framework and analyzes the interaction between RFS and geographic detectors.Finally, by using the GWR model, the relationship between RFS and the distribution of new cases is modeled to construct the risk assessment model.The assessment results are then correlated with the actual distribution of cases to validate the model.

Materials and Methods
The pandemic risk within different regions of a city is intricately linked to its geographical characteristics.Large urban centers experience variations in infection rates among different areas due to differences in population size, the presence of gathering places such as supermarkets and public squares, and disparities in transportation infrastructure.Considering these pivotal factors, our research employs grid-level data on diagnosed patients (GLD), population density (PD), points of interest (POI) data, and road network (RD) data to create pandemic risk factors.

Study Area
Shanghai is located at the mouth of the Yangtze River on the central coast of mainland China and is divided into 16 districts.Since March 2022, Shanghai has experienced a sharp increase in the cumulative number of confirmed COVID cases, which was significantly impacted by the pandemic.Therefore, Shanghai was chosen as the study area due to its representative nature regarding the outbreak.

Data Sources
The data used in this study include geospatial big data and grid-level data on newly diagnosed patients.

Geospatial Big Data
The spatial distribution of populations and factors such as transportation and clustering hotspots are highly correlated.By combining corresponding geospatial data, it is helpful to accurately characterize the density of populations at the grid level and quantify population clustering characteristics.
This study selected POI data, road network data, and population density data from geospatial big data.POI data are highly correlated with population clustering hotspots.Supermarkets, public places, and public transportation hubs still attracted typical population clusters during the epidemic.Therefore, this study obtained POI data from Baidu Maps, including public services, shopping services, and transportation services categories, for Shanghai in 2022, totaling 109,237 records.
Additionally, through a grid-based analysis, hotspot areas of population clustering were divided into units, and each grid value represented the number of clustering hotspots in that area, indicating the attractiveness of geographical grid regions for population clustering.
Population density directly reflects the degree of population aggregation and is closely related to disease transmission.The data were obtained from the Land Scan Global Population Database (https://landscan.ornl.gov/,accessed on 1 May 2022), which aims to provide high-precision spatial population data for risk assessments.In this study, it was aligned with the data from the seventh national census for calibration purposes.The distribution of road networks exhibits a strong spatial correlation with population distribution [2].
Road network data were sourced from OpenStreetMap (https://www.openstreetmap.org,accessed on 1 May 2022).To meet the requirements of quantitative analysis, primary, secondary, and urban arterial roads were selected, and a line density analysis was carried out to convert them into grid format.

Grid-Level Data on Diagnosed Patients
The data were obtained from the daily announcements by the Shanghai Municipal Health Commission (sh.gov.cn,accessed on 1 May 2022) regarding the residential information of the cases.This study utilized web scraping to obtain a total of 150,546 records of patients' residential information, with a higher number of newly infected individuals between 1 April and 14 May 2022.
Furthermore, the study utilized the geocoding technology available in the Baidu Maps API interface to obtain high-precision spatial location information for the cases.This technology converts the distribution addresses of the cases into spatial coordinates.Finally, the ArcGIS tool was utilized to add XY coordinates to spatialize the case data at a finer granularity.For quantitative analysis at the grid level, the patient community distribution data were divided into 1 km grids using the geographical grid method, and generated GLD data using geographic grid sampling, with each grid value representing the number of cases in that area.
Additionally, the distribution of new cases within each grid was used to indicate the risk of infection.Taking April 1 as an example, the resulting case distribution data are shown in Figure 1.
streetmap.org,accessed on 1 May 2022).To meet the requirements of qua sis, primary, secondary, and urban arterial roads were selected, and a line d was carried out to convert them into grid format.

Grid-Level Data on Diagnosed Patients
The data were obtained from the daily announcements by the Shan Health Commission (sh.gov.cn,accessed on 1 May 2022) regarding the re mation of the cases.This study utilized web scraping to obtain a total of of patients' residential information, with a higher number of newly infec between 1 April and 14 May 2022.
Furthermore, the study utilized the geocoding technology availabl Maps API interface to obtain high-precision spatial location information fo technology converts the distribution addresses of the cases into spatial c nally, the ArcGIS tool was utilized to add XY coordinates to spatialize th finer granularity.For quantitative analysis at the grid level, the patient com bution data were divided into 1 km grids using the geographical grid meth ated GLD data using geographic grid sampling, with each grid value re number of cases in that area.
Additionally, the distribution of new cases within each grid was used risk of infection.Taking April 1 as an example, the resulting case distrib shown in Figure 1.According to the provided text, the incubation period of a general cor tion is typically around 14 days.Therefore, it is possible to designate a 14 an analytical cycle for studying the distribution of new cases.
In this study, the obtained epidemiological data from Shanghai are div periods: April 1 to April 15, April 16 to April 30, and May 1 to May 14 in two periods are used for detecting an interaction and establishing evaluati third period is used for model validation.
Additionally, we conducted a multicollinearity test on the selected i the Variance Inflation Factor (VIF).The results of the test revealed VIF va According to the provided text, the incubation period of a general coronavirus infection is typically around 14 days.Therefore, it is possible to designate a 14-day period as an analytical cycle for studying the distribution of new cases.
In this study, the obtained epidemiological data from Shanghai are divided into three periods: April 1 to April 15, April 16 to April 30, and May 1 to May 14 in 2022.The first two periods are used for detecting an interaction and establishing evaluation models.The third period is used for model validation.
Additionally, we conducted a multicollinearity test on the selected indicators using the Variance Inflation Factor (VIF).The results of the test revealed VIF values of 6.7, 3.8, 4.4, and 2.7 for the indicators GLD, PD, POI, and RD, respectively.All these values were found to be less than 10, indicating the absence of severe multicollinearity issues at a tolerance level of 0.1.

Risk Assessment Model Establishment Methods
The experimental flowchart is shown in Figure 2.
ainability 2023, 15, x FOR PEER REVIEW 5 of found to be less than 10, indicating the absence of severe multicollinearity issues at a t erance level of 0.1.

Risk Assessment Model Establishment Methods
The experimental flowchart is shown in Figure 2. Within the framework of resilient cities theory, the risk faced by different regio within a city in dealing with infectious diseases primarily consists of two elements: shoc and resilience.Using the following examples depicted in Figure 3a,b, the methodology analyzing epidemic risks under the theory of urban resilience can be elucidated.In Figu 3a, which depicts a region with a low resilience level, a higher risk is often manifest when facing the same shocks compared to the region depicted in Figure 3b, which exhib a high resilience level and consequently shows lower risk.Furthermore, within the sa region, when confronted with different shocks, a greater risk is generated when the imp is stronger.Within the framework of resilient cities theory, the risk faced by different regions within a city in dealing with infectious diseases primarily consists of two elements: shocks and resilience.Using the following examples depicted in Figure 3a,b, the methodology for analyzing epidemic risks under the theory of urban resilience can be elucidated.In Figure 3a, which depicts a region with a low resilience level, a higher risk is often manifested when facing the same shocks compared to the region depicted in Figure 3b, which exhibits a high resilience level and consequently shows lower risk.Furthermore, within the same region, when confronted with different shocks, a greater risk is generated when the impact is stronger.
Therefore, this study characterizes the impact indicators of different regions within a city in the face of an epidemic by utilizing patient distribution data at the grid scale (grid-level data).Additionally, geospatial big data such as PD, POI data, and RD are employed as resilience indicators within the framework.The combination of impact and resilience constructs the RFS.Therefore, this study characterizes the impact indicators of different regions within a city in the face of an epidemic by utilizing patient distribution data at the grid scale (gridlevel data).Additionally, geospatial big data such as PD, POI data, and RD are employed as resilience indicators within the framework.The combination of impact and resilience constructs the RFS.

RFS Interaction Detection Method
The geographic detector technique allows for the exploration of the interaction between RFS [6].It is used to assess the coupling relationship between RFS and the distribution of new cases.One advantage of the geographic detector is that it does not assume linearity and has clear physical interpretations.The quantitative evaluation of the results is represented by the q-value, which reflects the similarity of spatial patterns among different factors.The change in q-values before and after RFS interactions is used to evaluate the coupling relationship between various indicators.The q-value is calculated using the following formula: Here, h = 1, 2, …, L represents the stratification of the independent variable X or the dependent variable Y. N and N are the number of units in stratum h and the entire region, respectively.σ and σ are the variances of the Y values in stratum h and the entire region, respectively.
In this study, the "GD" package in the R language is used to perform the geographic detector analysis.The RFS are treated as explanatory variables (X) and the distribution of new cases is the variable of interest (Y).The variables are stratified according to the optimal stratification scheme provided.After calculating the q-value for individual factors, "q(X1∩X2)" is computed to analyze the interaction between factors in space.If "q(X1∩X2)"

RFS Interaction Detection Method
The geographic detector technique allows for the exploration of the interaction between RFS [6].It is used to assess the coupling relationship between RFS and the distribution of new cases.One advantage of the geographic detector is that it does not assume linearity and has clear physical interpretations.The quantitative evaluation of the results is represented by the q-value, which reflects the similarity of spatial patterns among different factors.The change in q-values before and after RFS interactions is used to evaluate the coupling relationship between various indicators.The q-value is calculated using the following formula: Here, h = 1, 2, . .., L represents the stratification of the independent variable X or the dependent variable Y. N h and N are the number of units in stratum h and the entire region, respectively.σ 2 h and σ 2 are the variances of the Y values in stratum h and the entire region, respectively.
In this study, the "GD" package in the R language is used to perform the geographic detector analysis.The RFS are treated as explanatory variables (X) and the distribution of new cases is the variable of interest (Y).The variables are stratified according to the optimal stratification scheme provided.After calculating the q-value for individual factors, "q(X1∩X2)" is computed to analyze the interaction between factors in space.If "q(X1∩X2)" > Max(q(X1), q(X2)), this indicates an enhanced interaction between the two factors.If "q(X1∩X2)" < Min(q(X1), q(X2)) or Min(q(X1), q(X2)) < "q(X1∩X2)" < Max(q(X1), q(X2)), this suggests a weakened interaction between the two factors.

Establishment Method of Risk Factors and Distribution of New Cases
Establishing the relationship between RFS and the distribution of new cases involves the use of Geographically Weighted Regression (GWR) models, which are essential tools for explaining the spatial distribution of diseases [7][8][9].These models analyze the spatial heterogeneity of the impact through the distribution of regression coefficients and perform a risk assessment based on the fitting relationship.By incorporating a spatial weighting function, GWR models link grid points with neighboring areas and perform regression modeling in each partition.
Compared to the Ordinary Least Squares (OLS) model, GWR models can more effectively consider the influence of geographic neighbors and the heterogeneity of the impact factors.By using the GWR model, the neighborhood case distribution and population characteristics, as well as the heterogeneous influence levels of the factors in different regions, can be adequately considered.This provides a better explanation of the spatial distribution of RFS and new cases.
To eliminate the influence of data dimensionality, RFS are standardized using the following formula: Here, (u l , v l ) represents the spatial location of the l−th sample, and Risk l and x il rep- resent the risk and RFS value at the l-th spatial location, respectively.β gwi (u l , v l ) represents the regression coefficient of the i-th independent variable for the l−th sample in space.ε l is the random error, following a normal distribution.

Accuracy Test Method
In order to assess the infection risk in the subsequent period 3 of the study area, the RFS during mid-term 2 of the research area were used as explanatory variables in the evaluation model.
The relative magnitude of the risk index obtained from the model was used to assess the level of infection risk among different regions within the administrative area [10].
Additionally, to validate the accuracy of the risk assessment model, the evaluation results were subjected to correlation analysis with the actual distribution of cases, and the Spearman correlation coefficient (p) and the coefficient of determination (R2) for the linear regression relationship between the two were calculated.The Spearman correlation coefficient (p) quantitatively evaluates the ordinal relationship between two sets of data distributions [11], determining whether there is a higher number of new cases in areas with higher risk indexes.
The coefficient of determination (R2) assesses the explanatory power of the heterogeneous distribution of risk indexes on the heterogeneous distribution of actual new cases by calculating the extent to which the variation of the independent variable explains the variation of the dependent variable [12].The calculation formulas are provided below: d i represents the difference between the risk index of the grid region i and the ordinal distribution of new populations and n represents the sample size.y i represents the actual distribution of cases and ŷ1 is the regression-fitted value using the evaluated risk index.

Analysis of Risk Assessment Model Results
In this study, the GLD was considered as x 1 , PD as x 2 , POI as x 3 , and RD as x 4 .The results of the single-factor explanatory power (q-value) and its significance (p-value) are presented in Table 1, while the results of the RFS interactions are shown in Table 2.
Table 1.q-value of single factor (GLD was considered as x 1 , PD as x 2 , POI as x 3 , and RD as x 4 ).

X
x Based on the q-values of single factors (Table 1), the highest explanatory power is observed for patient distribution, reaching 0.813.This indicates that the spatial distribution of cumulative cases is the main factor influencing the spatial distribution of future new cases.The greater the number of cumulative case distributions in a region, the higher the number of future new cases.
The population density factor follows, with a q-value of 0.72, which is slightly lower than the patient distribution factor but still at a relatively high level.It reflects a high similarity between areas with high/low population density and areas with high/low numbers of new cases.Therefore, in areas with higher population density, there are more patient distributions and higher risks.
The q-value of cluster hotspots POI reaches 0.536, indicating that regions with more clustering hotspots generally have a higher number of case distributions.
The factor with the lowest explanatory power is road network density, with a q-value of 0.111, and it also exhibits lower significance.
An analysis of the interaction results (Table 2, Figure 4) reveals that after interacting with population density, patient distribution exhibits a higher explanatory power (0.912) compared to its individual factor (0.813).Moreover, when interacting with road network density and cluster hotspot indicators, the explanatory power for the distribution of new patients is enhanced, reaching 0.911 and 0.822, respectively.
Although road network density alone shows lower explanatory power, its interaction Moreover, when interacting with road network density and cluster hotspot indicators, the explanatory power for the distribution of new patients is enhanced, reaching 0.911 and 0.822, respectively.
Although road network density alone shows lower explanatory power, its interaction with cluster hotspots demonstrates significant non-linear enhancement.The dense transportation network facilitates population flow toward clustering hotspots, leading to a substantial increase in regional infection risk through interaction.The interaction between various indicators of the risk index enhances their explanatory power, demonstrating a synergistic effect.Therefore, combining patient distribution data with geographical big data can better explain the spatial heterogeneity of patient distribution.

Analysis of the Relationship between RFS and the Distribution of New Patients
The relationship between RFS and the distribution of new cases (Figure 5) was fitted using the Geographically Weighted Regression (GWR) model.All variables of the RFS passed the significance test at a confidence level of 0.05.The fitted coefficient of determination (R2) was 0.903 (p < 0.001).The influence coefficients of the RFS variables were categorized using the natural break classification method and visualized for analysis (Figure 6).The parameter estimation results of each indicator in the grid units exhibited distinct variations across different regions.Overall, most indicators showed positive regression coefficients, indicating a strong spatial variation in the impact of RFS on the spatial distribution of new patients.
The high-value areas of the fitted coefficient between the infected population distribution and population density were primarily concentrated in the city center.The impact decreased gradually from the center to the surrounding areas.Table 3 illustrates the spatial distribution statistics of coefficients.It is evident tha highest coefficient corresponds to the distribution of patients from the previous time riod, with a maximum value of 1.28.In contrast, the coefficients for the factors PD, and RD exhibit close statistical values.Table 3 illustrates the spatial distribution statistics of coefficients.It is evident that the highest coefficient corresponds to the distribution of patients from the previous time period, with a maximum value of 1.28.In contrast, the coefficients for the factors PD, POI, and RD exhibit close statistical values.The spatial distribution of the influence coefficient of POI displayed a zonal pattern from south to north, with relatively small overall variations and almost no significant spatial heterogeneity.The impact of aggregated hotspots was not strongly associated with whether they were located in the city center or suburban areas.
The population in both suburban and central areas of the city resided in environments with a higher risk of susceptibility.Regarding the influence of road network density, the coefficient was largest in the city center and decreased toward the surrounding areas.However, negative values appeared in areas closer to the city center, which could be attributed to the proximity of these regions to the city center and the influx of population predominantly concentrated in the central area.

Model Accuracy Evaluation
In the evaluation model constructed by inputting the RFS within period 2 as explanatory variables, the risk of new COVID-19 infections in various regions during the next period, period 3, was assessed.Based on the assessment of the risk of infection (Figure 7a), the spatial distribution of the risk index exhibited a spatial pattern of decreasing intensity from the center to the periphery.The Huangpu District, situated in the central region, the highest infection risk index, surpassing 7.  To comprehensively investigate the variations in model performance across different regions, our study employed a geographical division of Shanghai based on national standards.We conducted a detailed analysis of the model's accuracy discrepancies within the urban central areas and other regions.The central urban areas, as defined, comprise seven distinct administrative districts, namely, Huangpu, Xuhui, Changning, Yangpu, Hongkou, Putuo, and Jing'an.In contrast, the remaining regions were classified as noncore areas.Notably, these central areas are characterized by a significantly higher population density, while the non-core areas exhibit a comparatively lower population density.
As shown in Figure 9a,b, it is evident that in Shanghai's core areas, the model achieved a coefficient of determination (R2) of 0.943.In contrast, in other areas, the model's R2 was 0.826, which is noticeably lower than the 0.943 in core areas.This clearly indicates that the model exhibits higher precision in high-population density core areas.A correlation analysis was conducted between the assessment results and the spatial distribution of actual new cases within the corresponding time period (Figure 7b), resulting in a scatter plot of the correlation (Figure 8).Overall, both the coefficient of determination (R2) and the Spearman correlation coefficient were found to be at a relatively high level.With an R2 value of 0.938 (p < 0.01), the heterogeneous distribution of the assessed risk index can effectively explain the spatial heterogeneity of newly infected individuals.According to the Spearman correlation coefficient of 0.869 (p < 0.01), there is a good correlation between the risk index and the number of patient distributions, indicating that in the high-value assessment areas, the number of new cases also tends to be high.To comprehensively investigate the variations in model performance across differen regions, our study employed a geographical division of Shanghai based on national stand ards.We conducted a detailed analysis of the model's accuracy discrepancies within th Specifically, several grid cells in the Huangpu District had standardized risk indices exceeding 10, and the number of actual new cases in residential areas was also the highest, all of which fell within the 95% confidence ellipse, indicating a strong correlation in the high-value areas.However, in some low-value areas, the risk assessment appeared to be overestimated for certain regions, which was possibly due to higher road network density and population density.Nevertheless, most of the low-value areas also fell within the 95% confidence ellipse.In general, both the model fit and the ordinal correlation were quite good.Therefore, the model achieved good results by integrating patient distribution data with geographic big data related to population aggregations and mobility patterns.
To comprehensively investigate the variations in model performance across different regions, our study employed a geographical division of Shanghai based on national standards.We conducted a detailed analysis of the model's accuracy discrepancies within the urban central areas and other regions.The central urban areas, as defined, comprise seven distinct administrative districts, namely, Huangpu, Xuhui, Changning, Yangpu, Hongkou, Putuo, and Jing'an.In contrast, the remaining regions were classified as non-core areas.Notably, these central areas are characterized by a significantly higher population density, while the non-core areas exhibit a comparatively lower population density.
As shown in Figure 9a,b, it is evident that in Shanghai's core areas, the model achieved a coefficient of determination (R2) of 0.943.In contrast, in other areas, the model's R2 was 0.826, which is noticeably lower than the 0.943 in core areas.This clearly indicates that the model exhibits higher precision in high-population density core areas.

Model Advantages and Potential for Large-Scale Applications
We found that GLD is the most critical factor for epidemic risk generation (Table 3), enhancing the model's reliability by incorporating GLD indicators.In contrast to numerous prior studies [12,[14][15][16][17][18][19], detailed data on patient distribution are frequently overlooked in risk research.In this study, the residential addresses of patients were geographically coded, and fine-grained patient distribution data were obtained as a risk factor, significantly enhancing the scientific foundation of our risk factor analysis [19][20][21][22].
In Figure 9, the model demonstrates superior accuracy in the central areas of the study region, particularly in the core.This heightened precision can be attributed to the more concentrated distribution of patients in these central areas, aligning with Yao Xiao's perspective [10].This suggests that the model may be particularly well-suited for urban core regions.
This study uses a 1 km × 1 km spatial scale for risk assessment, providing a more nuanced representation of risk variations in various areas within administrative regions.Unlike the numerous studies typically concentrated on a macro scale, as illustrated in Table 4, this study excels in delineating risk variations within urban areas, contributing to the formulation of specific prevention and control policies [24][25][26][27][28][29][30][31].Therefore, the risk assessment of our model significantly aids in the meticulous management of epidemic risks

Model Advantages and Potential for Large-Scale Applications
We found that GLD is the most critical factor for epidemic risk generation (Table 3), enhancing the model's reliability by incorporating GLD indicators.In contrast to numerous prior studies [12,[14][15][16][17][18][19], detailed data on patient distribution are frequently overlooked in risk research.In this study, the residential addresses of patients were geographically coded, and fine-grained patient distribution data were obtained as a risk factor, significantly enhancing the scientific foundation of our risk factor analysis [19][20][21][22].
In Figure 9, the model demonstrates superior accuracy in the central areas of the study region, particularly in the core.This heightened precision can be attributed to the more concentrated distribution of patients in these central areas, aligning with Yao Xiao's perspective [10].This suggests that the model may be particularly well-suited for urban core regions.
This study uses a 1 km × 1 km spatial scale for risk assessment, providing a more nuanced representation of risk variations in various areas within administrative regions.
Unlike the numerous studies typically concentrated on a macro scale, as illustrated in Table 4, this study excels in delineating risk variations within urban areas, contributing to the formulation of specific prevention and control policies [24][25][26][27][28][29][30][31].Therefore, the risk assessment of our model significantly aids in the meticulous management of epidemic risks at a micro level.

Related Research Spatial Scale Level
Xu et al. [30] Provincial level Wei et al. [31] District level Our study 1 km × 1 km grid level

Implications of Research Results for Epidemic Prevention and Control
As shown in Table 1, the impact of new cases in the past 14 days emerges as the most influential factor on epidemic risk.These findings align with our existing understanding of the fundamental spatial distribution pattern of COVID-19 [32][33][34].This emphasizes the need to address dynamic risks associated with the geographical locations of infected individuals for effective epidemic risk management.It helps reduce substantial risks posed by these individuals to the epidemic.Population density is the second-most influential factor, indicating a higher susceptibility to disease outbreak and spread in densely populated areas [35,36].Consequently, the control of densely populated areas becomes a critical focal point for enhanced prevention and management of epidemics.
Table 2 reveals stronger interactions among various risk factors, particularly the nonlinear enhancement between road networks and gathering hotspots' Points of Interest (POI).This highlights the imperative to prioritize the control of highly interconnected areas within road networks as a strategic measure to mitigate the risk of disease spread during epidemics.
In Figure 5, GWR model regression coefficients show significant spatial heterogeneity, with notable differences in GLD and Population Density (PD) coefficients.To better understand the spatial heterogeneity of GLD and PD regression coefficients, a statistical analysis of the coefficients' means was conducted, categorizing the region into core and non-core areas (Figure 10).The statistical results indicated higher regression coefficients in the core areas.This phenomenon can be attributed to the increase in population density [15], the presence of densely populated spaces [18], and the extensive distribution of complex transportation networks in urban center areas [27].The synergistic interaction of these interconnected factors creates favorable conditions for the spatial spread of diseases [37].Consequently, due to increased population density and numerous gathering places, there is an increased infection risk in core areas [38].This emphasizes the urgent need, especially during pandemics, to implement targeted intervention measures meticulously designed to address the escalating risks within the urban core zones [39,40].
Interaction results and GWR regression coefficients indicate that higher population density, concentrated urban spaces, and complex transportation networks create favorable conditions for the rapid spread of infectious sources.These findings have implications for public health policies and intervention measures, emphasizing the need for nuanced approaches to protect population health, especially in densely populated core regions of urban centers [37][38][39][40].
[15], the presence of densely populated spaces [18], and the extensive distribution of complex transportation networks in urban center areas [27].The synergistic interaction of these interconnected factors creates favorable conditions for the spatial spread of diseases [37].Consequently, due to increased population density and numerous gathering places, there is an increased infection risk in core areas [38].This emphasizes the urgent need, especially during pandemics, to implement targeted intervention measures meticulously designed to address the escalating risks within the urban core zones [39,40].Interaction results and GWR regression coefficients indicate that higher population density, concentrated urban spaces, and complex transportation networks create favorable conditions for the rapid spread of infectious sources.These findings have implications for public health policies and intervention measures, emphasizing the need for nuanced approaches to protect population health, especially in densely populated core regions of urban centers [37][38][39][40].

Shortcomings and Prospects
This study achieved a high level of accuracy in establishing an epidemic risk assessment model using geographic detector and GWR models.However, it is crucial to acknowledge that local economic conditions often influence epidemiological risks to a

Shortcomings and Prospects
This study achieved a high level of accuracy in establishing an epidemic risk assessment model using geographic detector and GWR models.However, it is crucial to acknowledge that local economic conditions often influence epidemiological risks to a certain extent [41].Although variables such as economic conditions are interconnected, obtaining them at a fine spatial resolution is challenging.Furthermore, data on fine-grained economic conditions typically introduce significant measurement errors.Therefore, this study did not incorporate these data as risk factors.Subsequent research could benefit from integrating high-precision data related to these risk factors to further enhance the model.Additionally, considering that this study only reflects spatial characteristics and does not account for temporal aspects, future efforts could improve the robustness of this analytical framework by integrating time-series models with the GWR model.

Conclusions
In examining spatial variations in COVID-19 risks within a city through the lens of urban resilience, we applied geographic coding techniques to gridify the distribution data of COVID-19 cases.These were then integrated with geographic big data, including points of interest, population density, and road network density, as risk factors.Utilizing the Geographically Weighted Regression model, we developed a risk assessment model to evaluate infection risks across different areas within the administrative regions over a 14-day period.Subsequently, we conducted a correlation analysis between the assessment results and the actual distribution of cases to gauge the model's precision.Our study led to several key conclusions: 1.
The model crafted in this study accurately simulates the spatial variation in COVID-19 infection risks within diverse areas of the administrative regions.This underscores its reliability in assessing infection risks across different spatial units within the administrative regions; 2.
By accounting for the interplay among risk factors, the explanatory power for the spatial distribution of new cases is heightened, revealing a synergistic effect; 3.
The assessment of infection risks in Shanghai reveals a spatial pattern characterized by a gradual decrease from the city center towards the periphery.This indicates that the core areas of Shanghai provide favorable conditions for the spatial spread of diseases, resulting in elevated risks in the central regions.

Figure 3 .
Figure 3. Epidemic risk analysis chart based on urban resilience theory: (a) Area with low level of resistance; (b) Area with high level of resistance.

Figure 3 .
Figure 3. Epidemic risk analysis chart based on urban resilience theory: (a) Area with low level of resistance; (b) Area with high level of resistance.

Figure 4 .
Figure 4. RFS interaction results (GLD was considered as x 1 , PD as x 2 , POI as x 3 , and RD as x 4 ).

Sustainability 2023 , 17 Figure 7 .
Figure 7. GWR results vs. actual new cases: (a) GWR risk value results; (b) Spatial distribution of actual new cases.

Figure 7 .
Figure 7. GWR results vs. actual new cases: (a) GWR risk value results; (b) Spatial distribution of actual new cases.

17 Figure 9 .
Figure 9. Central area and other area correlation scatter plot: (a) Central area scatter plot; (b) Other area scatter plot.

Figure 9 .
Figure 9. Central area and other area correlation scatter plot: (a) Central area scatter plot; (b) Other area scatter plot.

Figure 10 .
Figure 10.The mean coefficients of the central region and other regions.

Figure 10 .
Figure 10.The mean coefficients of the central region and other regions.

Table 2 .
Interaction analysis (GLD was considered as x 1 , PD as x 2 , POI as x 3 , and RD as x 4 ).