Research on the Spatial Correlation and Spatial Lag of COVID-19 Infection Based on Spatial Analysis

: COVID-19 has spread throughout the world since the virus was discovered in 2019. Thus, this study aimed to identify the global transmission trend of the COVID-19 from the perspective of the spatial correlation and spatial lag. The research used primary data collected of daily increases in the amount of COVID-19 in 14 countries, conﬁrmed diagnosis, recovered numbers, and deaths. Findings of the Moran index showed that the propagation of infection was aggregated between 9 May and 21 May based on the composite spatial weight matrix. The results from the Lagrange multiplier test indicated the COVID-19 patients can infect others with a lag. and the statistical signiﬁcance increases. It shows that the infection rate of COVID-19 between countries is not a random distribution in spatial dimension, but there is a certain positive spatial correlation. That is to say, compared with countries that are not geographically adjacent and have relatively far geographical locations, the infection rates among countries that are close and connected are closer. The daily increase of the global Moran index and statistical signiﬁcance also showed that the positive spatial correlation characteristics gradually increased with the passage of the date.


Introduction
The outbreak of COVID-19 has had a large impact on people's lives and organizations such as global education, economy, sports, public health, medical care, transportation, and politics. The continuous spread of COVID-19 in the world has brought an unprecedented public health crisis to all mankind [1] and accelerated the world economy into a painful adjustment period [2]. City closure, shutdown, home, and other epidemic prevention measures have resulted in a 'big blockade' state; economic stagnation, political instability, social tearing, regional segregation, and other issues are also associated [3]. Under the impact of the epidemic, the world economic growth expectation is facing a continuous downward revision [4,5].
Since late February 2020, the new coronavirus epidemic outside China has shown a spreading trend. As of March 5, a total of 14,768 cases were diagnosed outside China [6], of which South Korea had the most, followed by Italy. The number of confirmed cases in these countries increased rapidly and reached the peak of the outbreak. In addition to Italy, the epidemic situation in other major European countries has also increased. The cumulative number of confirmed cases in France and Germany has exceeded 250, and that in Spain is close to 200. The number of confirmed cases in the United States is also increasing, exceeding 100 (Data sources: WHO|World Health Organization). Subsequently, the epidemic is still spreading globally.
On January 24, Jasper Fuk-Woo Chan et al. first published papers confirming that COVID-19 has 'human transmission characteristics' and 'asymptomatic cases' [7]. This paper was the first to give conclusive evidence of human transmission of COVID-19. The conclusion is that the authors carried out epidemiological analysis on the incidence of new corona pneumonia in seven people living in Shenzhen with Wuhan tourism history.
Since then, Chen Jieliang published a paper to analyze the comparison of COVID-19 with SARS coronavirus in 2003 and Middle East Respiratory Syndrome coronavirus (MERS) in 2014 [8]. We can know that this COVID-19 is the third emerging coronavirus that can infect people across species in the past 20 years. Studies have shown that COVID-19 is not as strong as the SARS virus in terms of the virus rate or basic infection number (R0). However, compared with influenza A and respiratory syndrome in the Middle East, it has stronger infection (the R0 value is larger). This is reflected in the number of people infected; that is, there are more confirmed cases of new coronary pneumonia. However, as a whole, the current COVID-19 seems to have relatively low pathogenicity and moderate transmission.
According to the paper published by Joseph T Wu et al., the epidemic scale in Wuhan was estimated based on the three data sources of population flow [9], the estimated sequence interval of confirmed cases and viruses (the time required for the infection of the infected person to infect others), and the epidemic situation in major cities in China and major international cities, which was simulated based on the susceptible-exposureinfection-recovery ensemble population model. Studies have shown that if the transmission capacity of COVID-19 is similar across the country, over time, the epidemic will increase exponentially in many major cities in China, and the outbreak time of the epidemic is about 1-2 weeks behind that of Wuhan.
On the basis of these studies, in order to study the global diffusion trend of COVID-19, this paper adopts the method of spatial econometrics to analyze the spatial spread of COVID-19 among 14 influential countries in the world. Spatial econometrics has become an important branch of econometrics, marking its birth as a discipline. Spatial econometrics introduces the spatial relationship of geographical units of research subjects into econometrics and divides spatial effects into spatial dependence and spatial heterogeneity, which can provide methodological basis for the construction of regional and urban econometric models. The development manager of spatial econometrics has had three stages over the last 40 years. The 1970s-1980s is the preparatory stage of spatial econometrics, and there are mainly two kinds of scholars with different academic backgrounds. The first category of scholars is geographers, and the second category is scholars studying regional science and regional and urban economics. The 1990s entered the second stage of the development of spatial econometrics. During this period, scholars' interest has begun to turn to the study of specific spatial econometric model setting. After 2000, the development of spatial econometrics has entered the third stage. During this period, spatial econometric model has become one of the important methods in mainstream empirical research [10]. During this period, the research on spatial econometrics continued to deepen, the application fields continued to expand, and a large number of empirical research papers emerged. From regional science, city and real estate economics, economic geography, and other specialized fields are widely used, now involving labor economics, international economics, resource and environmental economics, politics and development economics, and other fields. For example: Based on the panel data of 97 cities in the Yangtze River Basin (YRB) from 2005 to 2016, Zhang Qian et al. used a spatial econometric model to analyze the impact of urbanization on environmental regulation efficiency (ERE) in the Yangtze River Basin (YRB) [11].
Mobley Lee R uses spatial econometric methods to model hospital market pricing by using the price response curve estimated from California data. In the theoretical model, it shows how the slope of the reaction function reflects hospital specialization and how equilibrium prices are impacted by shifts in the reaction function [12].
Alexakis Christos et al. studied the impact of governments' social distancing measures against COVID-19 as this was reflected in 45 major stock market indices based on spatial measurements [13].
Spatial econometrics is widely used. In this paper, we hope to use spatial econometric methods to study the spread of COVID-19 among 14 major countries. The model of COVID-19 transmission was analyzed by analyzing the increasing trend of infection among countries.
The rest of the paper is organized as follows. Section 2 introduces data sets and research methods such as the Moran index, spatial weight matrix, Lagrange multiplier test method, and classical spatial econometric model. Section 3 introduces the theoretical framework of spatial econometrics. Section 4 analyzes the data of the Moran index, Lagrange multiplier test, spatial lag model, and the spatial Durbin model based on different spatial weight matrices. Section 5 analyzes the results. Finally, some conclusions are drawn in Section 6.

Data Collection
We collected daily increases in the number of COVID-19 in 14 countries, confirmed diagnosis, recovered numbers, and deaths (data sets from 9 May 2020 to 29 May 2020, for a total of 21 days; The last visit to the website was 20 December 2020, https://github.com/ CSSEGISandData/COVID-19). To better analyze the spatial effects of virus transmission among these countries, we selected the daily infection rate as the explained variable and the number of confirmed cases, the number of recoveries, and the number of deaths as the explanatory variables. Among them, taking into account the incubation period of COVID-19, the daily infection rate of each country is calculated as follows: the cumulative number of local 14 days of COVID-19 daily increase is divided by the total number of countries (billions) and then divided by 14 days. In order to make the prediction model more perfect, the infection rate, the number of confirmed diagnosis, the number of recoveries, and the number of deaths were unified dimensioned. The infection rate was divided by 10, the number of confirmed diagnosis was divided by 10,000, the number of recoveries was divided by 1000, and the number of deaths was divided by 1000.
Firstly, the Moran index test was performed to determine the spatial effect of the model. Then, the Lagrange multiplier test was performed to determine if the spatial lag model is better. The spatial weight matrix W is used to analyze the spatial relationship of new COVID-19 transmission among the United States, Canada, Britain, South Korea, Japan, Italy, France, Germany, Spain, the Netherlands, Switzerland, India, Brazil, and China. The latitude and longitude of the central cities were selected to calculate the distance between countries; for example: Washington of the USA, Ottawa of Canada, Belfast of United Kingdom, Seoul of Korea, Tokyo of Japan, Rome of Italy, Paris of France, Berlin of Germany, Madrid of Spain, Amsterdam of Netherlands, Bern of Switzerland, New Delhi of India, Brasilia of Brazil, and Beijing of China. In order to facilitate the description, we note that the binary space weight matrix is W1, and the composite space weight matrix is W2. Then, we use these to analyze.

Moran's I Index
This paper uses Moran's I to test whether there is spatial correlation in the region. The Moran index test is based on least squares estimation [14][15][16][17]. Moran's I index reflects the similarity of the attribute values of spatial adjacent or adjacent regional units. Moran's I can test whether the model has spatial correlation. It tests whether the adjacent areas of the whole research area are positively or negatively correlated or independent of each other. The calculation formula of global Moran index is as follows [18]: where x i represents the observations of different spatial units in the sample, w ij is the element in the spatial weight, and n denotes the total number of spatial units studied.
Additionally, the mean and variance of the observations are x = 1 n n ∑ i=1 x i and Moran index statistics generally range from −1 to 1; greater than 0 represents positive correlation; less than 0 represents negative correlation; closer to 1 represents that similar attributes gather together, that is higher spatial aggregation; closer to −1 represents that different attributes gather together, that is spatial dispersion; 1 represents complete spatial aggregation, −1 represents complete spatial dispersion; 0 represents random distribution in space, or there is no spatial autocorrelation.
However, there is a problem in Moran's I statistics. It is not possible to determine whether the calculated Moran's I statistics are statistically significant. For example, when Moran's I = 0.1, although it represents a positive spatial autocorrelation, it is not statistically significant. For statistical testing, Moran 's I can be converted to a Z value to test whether it is greater than 1.96 or less than −1.96 to show whether the spatial autocorrelation test is significant at a 5% confidence level [6,19].

Spatial Weight Matrix
At present, the main forms of constructing the spatial weight matrix include the adjacency matrix, inverse distance matrix, economic characteristic matrix, and nested matrix [20]. In this study, we consider that the spread of COVID-19 is related to the distance between countries and whether they are adjacent. Therefore, the construction of spatial weight matrix is based on three methods. The first method is a binary space weight matrix, which is set according to whether the space geography is adjacent. The adjacent space unit is 1, and the other non-adjacent areas are given 0. Therefore, the weight matrix is defined as follows [21]: when the place i is adjacent to the place j.
0, when the place i is not adjacent to the place j.
Referring to the world map, as shown in Figure 1, 14 countries are geographically adjacent to the following. The adjacency between the United States and Canada is denoted as 1, and the non-neighborhood with other countries is denoted as 0. The non-adjacent relationship between the UK and other countries is denoted as 0. Japan and South Korea are not adjacent to other countries recorded as 0. Italy, France, and Switzerland are adjacent to 1 and non-adjacent to 0. The adjacency of France, Spain, Germany, Italy, and Switzerland is denoted as 1, and the adjacency of other countries is denoted as 0. Germany, France, the Netherlands, and Switzerland are connected as 1, and the other countries are not adjacent as 0. Spain and France are adjacent to 1, while other countries are not adjacent to 0. The connection between the Netherlands and Germany is denoted as 1, and the non-neighborhood with other countries is denoted as 0. Switzerland, France, Germany, and Italy are connected as 1, and other countries are not adjacent as 0. India is adjacent to China, but because of China's policy reasons, it is recorded as 0. The non-neighborhood between Brazil and other countries is denoted as 0. Due to China's policy reasons, there was a timely blockade, so China and other neighboring countries are recorded as 0.
The second method is the inverse distance space weight matrix. According to the first law of geography, the spatial interaction effect between spatial units farther away is weaker. Do not consider the threshold distance, that is, no matter how far the two spatial units converge, there is a distance. Therefore, the spatial weight matrix is constructed based on reciprocal distance, namely: The third method is the composite space weight matrix. The composite space weight matrix is set by combining the distance and adjacency matrix. Adding the constructed geographical distance space weight matrix and binary space weight matrix together, the composite space weight matrix based on two influencing factors is obtained after standardization. Namely: where w ij is the element of the binary space weight matrix. This matrix is used to perform the next analysis. The second method is the inverse distance space weight matrix. According to the first law of geography, the spatial interaction effect between spatial units farther away is weaker. Do not consider the threshold distance, that is, no matter how far the two spatial units converge, there is a distance. Therefore, the spatial weight matrix is constructed based on reciprocal distance, namely: The third method is the composite space weight matrix. The composite space weight matrix is set by combining the distance and adjacency matrix. Adding the constructed geographical distance space weight matrix and binary space weight matrix together, the composite space weight matrix based on two influencing factors is obtained after standardization. Namely: where ij w is the element of the binary space weight matrix.
This matrix is used to perform the next analysis.

Lagrange Multiplier Test
In this paper, the Lagrange multiplier test based on the statistical test is used to select the spatial econometric model. The Lagrange multiplier test, also called the LM test, is based on maximum likelihood estimation. There are two basic models in spatial econometrics: the spatial lag model and spatial error model. In the empirical analysis, based on statistical inference, the Lagrange multiplier test [22,23]

Lagrange Multiplier Test
In this paper, the Lagrange multiplier test based on the statistical test is used to select the spatial econometric model. The Lagrange multiplier test, also called the LM test, is based on maximum likelihood estimation. There are two basic models in spatial econometrics: the spatial lag model and spatial error model. In the empirical analysis, based on statistical inference, the Lagrange multiplier test [22,23] is most commonly used to judge the advantages and disadvantages of the two models. Lagrange multiplier test (LM test) includes LM-Error test, robust LM-Error test, LM-Lag test, and robust LM-Lag test. These four statistics are respectively: Among them: where ∧ β is the OLS estimation of model parameters in the original hypothesis. These four statistical tests correspond to four situations of LM test of spatial econometric model.
(1) LM-Error Statistics, that is, an LM test of spatial residual correlation without the spatial autoregressive effect. The original assumption is that the model residual does not include spatial correlation. The alternative hypothesis indicates that there is a spatial effect in the residual, and the spatial effect of the residual includes the spatial residual autocorrelation and the spatial residual moving average, (2) LM-Lag Statistics, that is, the LM test of the spatial autoregressive effect when there is no spatial residual correlation; the original assumption is that the model residual does not include spatial correlation, (3) Robust LM-Error Statistics, that is, the LM test for spatial residual correlation in the presence of spatial autoregressive. The original hypothesis is that there is no spatial correlation in the model residual. Alternative assumptions are the same as LM-Error Statistics, (4) Robust LM-Lag Statistics, that is, the LM test of the spatial autoregressive effect when spatial residual correlation exists, According to the four statistics of LM, the discriminant process and criterion are as follows: Firstly, OLS regression is carried out to obtain the residual of the regression model, and then LM diagnosis is carried out based on the residual. Anselin and Florax (1996) proposed the following criteria: if LM-Lag is more significant than LM-Error in the spatial dependence test, and R-LM-Lag is significant and R-LM-Error is not significant, it can be judged as a spatial lag model. On the contrary, if LM-Error is statistically more significant than LM-Lag, and R-LM-Error is significant while R-LM-Lag is not significant, then it can be determined that the spatial error model is an appropriate model.

Spatial Model
This paper selects spatial lag model, spatial error model, and spatial Durbin model to perform analysis.
(1) The spatial lag model is as follows [24], where y represents the n × 1 dependent variable. X represents the n × k independent variable represented. ρ represents the spatial autoregressive coefficient to be estimated. W represents the n × n spatial weight matrix. β represents the k × 1 coefficient of independent variables to be estimated. ε represents the n × 1 error term represented. For simplicity, we refer to the spatial lag model as the SAR model below.
(2) The spatial error model is as follows [25,26], where y represents the n × 1 dependent variable. X represents the n × k independent variable represented. β represents the k × 1 coefficient of independent variables to be estimated. W represents the n × n spatial weight matrix. λ represents the spatial lag coefficient to be estimated that is called the spatial autocorrelation coefficient. ε represents the n × 1 error term represented. For simplicity, we refer to the spatial error model as the SEM model below.
(3) The spatial Durbin model is as follows [27][28][29][30][31], where y represents the n × 1 dependent variable. X represents the n × k independent variable represented. ρ represents the spatial autoregressive coefficient to be estimated. W represents the n × n spatial weight matrix. β represents the k × 1 coefficient of independent variables to be estimated. θ represents the k × 1 coefficient of spatial lag term of independent variables to be estimated. ε represents the n × 1 error term represented. For simplicity, we refer to the spatial Durbin model as the SDM model below.

Theoretical Framework
The four key problems of spatial econometric modeling are the establishment of spatial weight, the spatial effect test, spatial model selection, and spatial model estimation.
Setting the spatial weight matrix is important for the spatial measurement model. At present, the main weight forms include the adjacency matrix, inverse distance matrix, economic characteristic matrix, and nested matrix. The simplest method is the spatial adjacency matrix [32]. Moran (1947) and Geary (1954) constructed binary space adjacency matrices based on 0 and 1 [33,34]. According to their point of view, the definition of an adjacency relationship mainly has the following several features: First, the space unit shares a common edge, according to the rules of chess, habitually named the 'Rook' rule; second, spatial units share a vertex (Common Vertex) called the 'Queen' rule; third, the secondorder Rook proximity space weight matrix; fourth, the second-order Queen proximity spatial weight matrix. Considering that the spatial effect generated by spatial spillover gradually decreases with the increase of geographical distance, the spatial weight matrix can be set according to the speed of attenuation. In the process of setting the spatial weight matrix, the reciprocal of geographical distance is generally used to represent the adjacency relationship of spatial units, and the inverse distance matrix is established. The weight of economic characteristics holds that the closer the economic aggregate of the two regions, the stronger the spatial spillover effect, which inherits the thinking logic of the inverse distance spatial weight. When spatial effects contain both distance and economic factors, a nested matrix is used [35]. The nested matrix combines the inverse distance weight matrix and the economic characteristic weight matrix organically, which aims to depict the comprehensiveness and complexity of the spatial effect as accurately as possible.
Spatial data analysis is an indispensable part of spatial data empirical research. After choosing an appropriate spatial weight matrix, spatial data can be analyzed preliminarily before establishing spatial econometric model. For socio-economic empirical research, it is mainly to use the spatial data analysis method of global spatial autocorrelation to measure and test the spatial distribution pattern. The global spatial autocorrelation reflects the spatial distribution pattern of a variable in the whole study area. The classical spatial autocorrelation test statistics are Moran's I and Getis-Ord General G [36]. Moran's I test is a statistic proposed by Moran (1950) to test global spatial autocorrelation. Moran's I is the most widely used spatial autocorrelation statistic. Getis and Ord (1992) proposed another global spatial autocorrelation statistic, the Getis-Ord General G statistic, which can identify high or low clustering models. Getis-Ord General G is also one of the common statistics to test global spatial autocorrelation. If the two statistics are statistically significant, Moran's I statistics only point out whether the data are aggregated in space, are scattered, or have a random distribution, and Getis-Ord General G statistics point out the type of spatial aggregation. For example, a region in a certain region is a hot spot with high values.
In addition, there is a spatial autocorrelation statistic: Geary's C statistic [37]. However, because Geary's C statistic is not widely used in empirical analysis, it is no longer described. This paper uses Moran's I to test whether there is spatial correlation in the region.
The establishment of spatial measurement model is the core of spatial measurement. After the spatial correlation is determined by the Moran index test, the appropriate spatial model is selected for modeling. The commonly used spatial econometric models can be listed as the following 10 types. In the spatial model family, there are spatial correlation models in error terms: the spatial error model (SEM), the spatial moving average model (SMA), and the spatial error component model (SEC) [38]. The spatial lag of X (SLX) is the spatial correlation model in explanatory variables. There are spatial correlation models in the explanatory variables: the first-order spatial autoregressive model (FAR) and the spatial lag model (SAR). The mixed spatial correlation spatial models include the seasonal autoregressive moving average model (SARMA), the general spatial model (SAC), the spatial Durbin model (SDM), and the spatial Durbin error model (SDEM).
The spatial error model (SEM), spatial lag model (SAR), and spatial Durbin model (SDM) are the three most classical spatial econometric models. The SAC model is a general spatial model that includes both spatial lag and spatial error term, considering the spatial correlation between dependent variables and the disturbance term. The SARMA model combines local spatial moving average with global spatial autoregressive process. The difference between the SAC model and the SARMA model lies in the difference in the generation process of disturbance data. The first-order spatial autoregressive model (FAR), spatial error component model (SEC). and spatial Durbin error model (SDEM) are not common. Similar to the first-order autoregressive model in time series analysis, FAR is mainly used to study how the changes of explained variables in adjacent regions affect the explained variables in the studied region. SEC was proposed by Kelejian and Robinson in 1993 and 1995, respectively. The biggest difference between SEC and SEM and SMA is that there is no spatial correlation coefficient in the error term, and the error term is composed of two independent error components. SDEM is the spatial Durbin error model proposed by LeSage and Pace [39], which only adds the spatial lag term of explanatory variables to the SEM model. The spatial lag model (SLX) takes into account the externality of the model space; that is, the independent variables of a spatial unit and the independent variables of adjacent units will affect the dependent variables. Based on the relationship between the above linear models, Elhorst (2014) describes the process from the special space model to the OLS model to the general model [40].
How should the appropriate measurement model in many spatial models be selected? The existing spatial econometric model selection methods can be summarized as follows: the Moran index test and LM test are based on statistical test methods; maximum likelihoodbased methods include Akaike information criterion (AIC), Bayesian information criterion or Schwartz criterion (BIC), and Hannan-Quine criterion; Bayesian selection based on model posterior probability has the method QAIC, which is also the information criterion in the case of excessive data dispersion; and there is the MCMC-based spatial econometric model selection method [41]. Log likelihood (LogL), likelihood ratio (LR), Akaike information criterion (AIC), Schwartz criterion (SC), and other commonly used test criteria in non-spatial models have the same test principle in spatial measurement, but the calculation is more complex. This paper is no longer detailed. In this paper, the Lagrange multiplier test based on the statistical test is used to select the spatial econometric model.

Moran Index
In order to determine whether there is spatial correlation between the spread of COVID-19 in various countries, we calculate the Moran index according to the binary spatial weight matrix (W1), the inverse distance space weight matrix (W2), and the composite spatial weight matrix (W3), respectively. Before setting the model, the spatial autocorrelation test of the core variables is carried out. In this paper, the global Moran's I is used to measure the spatial autocorrelation. Table 1 shows that the core variables in this paper have obvious spatial autocorrelation. Moran Table 1 shows that the spatial autocorrelation test is not significant at the 1% confidence level. It shows that the binary spatial weight matrix (W1) can only introduce some spatial factors of COVID-19 spread in various countries.
The  Table 1 that the spatial autocorrelation test is obvious at a 1% confidence level. It shows that the inverse distance spatial weight matrix W2 can introduce the spatial factors of COVID-19 in various countries.
Based on the composite spatial weight matrix (W3), the global Moran index calculated from May 9, 2020 to May 29, 2020 is 0.574, 0.590, 0.592, 0.602, 0.632, 0.654, 0.685, 0.702, 0.685, 0.690, 0.7, 0.704, 0.710, 0.710, 0.713, 0.710, 0.720, 0.720, 0.714, 0.694, and 0.681, respectively. The calculation results are positive and have a significant increase trend, and the numerical distribution is between 0.57 and 0.73. The greater the positive value of the Moran index, the stronger the spatial correlation, indicating that COVID-19 has a positive spatial correlation among countries. The statistical test continues, and the statistical test Z values are greater than 1.96. The global Moran index can pass the 1% statistical significance test, and the statistical significance increases. It shows that the infection rate of COVID-19 between countries is not a random distribution in spatial dimension, but there is a certain positive spatial correlation. That is to say, compared with countries that are not geographically adjacent and have relatively far geographical locations, the infection rates among countries that are close and connected are closer. The daily increase of the global Moran index and statistical significance also showed that the positive spatial correlation characteristics gradually increased with the passage of the date.

LM Test
The spatial correlation test in the first part shows that there is a significant positive correlation between the dissemination of COVID-19 in various countries. At this time, the estimation using the panel data model will lead to errors in the estimation coefficient, so the spatial econometric model should be used for estimation. Spatial econometric models mainly include three types: the spatial lag model (SAR), spatial error model (SEM), and spatial Durbin model (SDM), which needs to be tested. Tables 2-4 are analyzed by LM test based on different spatial weight matrices. According to the selection method of Anselin for the model, the binary spatial weight matrix W1 is first selected for the LM test. The results show that the adjoint probabilities of LM-Lag and LM-Error are less than 0.05; that is, these two statistics reject the original hypothesis at the significance level of 5%. According to these two statistics, it is impossible to determine whether to choose the spatial lag model or the spatial error model. With the help of the robust LM test, the robust LM-Lag statistic test results are significant, pointing to the spatial lag model.
Secondly, the LM test is carried out based on inverse distance space weight matrix W2. The results show that LM-Lag is tested at a 1% confidence level, and an LM-Error is not tested at 1% confidence level. Therefore, the space lag model is chosen.
Finally, the LM test is conducted based on the composite spatial weight matrix, namely W3. The results show that the LM-Lag is tested at 1% confidence level, LM-Error is not tested at the 1% confidence level, robust LM test has no spatial lag, and probability is more significant in statistics. Therefore, choose the space lag model.

Spatial Model
Based on LM test, the spatial lag model selected by statistical analysis is better. However, the empirical analysis of the model adopts the spatial lag model, and it is better to extend it to the spatial Durbin model. Therefore, Table 5 lists the estimation results of the spatial lag model and the spatial Durbin model based on different spatial weight matrices. Based on the spatial weight matrix (W1), the estimation results of the SAR model and the SDM model are compared. The positive and negative coefficients of the estimated variables are basically consistent, and only the estimated coefficients of the variables are significantly different. In the estimation results of the SAR model, in terms of the core explanatory variables, the estimated coefficient of confirmed/10,000 is 4.494, and it passes the 1% statistical significance test, indicating that the increase in confirmed number can promote the increase of infection rate. In terms of control variables, the estimated coefficient of recovered/1000 was −0.778 and passed the 1% statistical significance test. This indicates that the increase in the number of recovered people will inhibit the spread of infection in a wider range, thereby inhibiting the increase in infection rates. The estimated coefficient of deaths/1000 was 3.350 and passed the 5% statistical significance test. This shows that the increase in the number of deaths is conducive to the improvement of infection rate. By observing Table 5, the R-squared is 0.486, indicating that the SAR model needs to be improved. In the estimation results of the SDM model, the estimated coefficient of the confirmed/10,000 is 1.030, and the estimated coefficient of the recovered/1000 is −0.174. Although the coefficient and statistical significance test of the confirmed/10,000 and recovered/1000 are lower than those of the SAR model, they can still explain the correlation between various countries. Although this correlation is weak, it is still an indispensable factor for us to consider the spatial correlation of COVID-19 in various countries. The coefficient of deaths/1000 was 7.456 and passed the 1% statistical significance test. Compared with the SAR model, the coefficient of deaths/1000 was larger, and the significance level was improved. By observing Table 5, the R-squared of the SDM model is 0.713, which shows that the R-squared of the SDM model is better than the SAR model. From this point of view, SDM is relatively better.  Based on the spatial weight matrix (W2), the estimated results of the SAR model and the SDM model are compared. The positive and negative estimated coefficients of the variables are basically consistent with the estimated coefficients of the variables. In the estimation results of the SAR model, the estimated coefficient of the confirmed/10,000 is 1.763 in terms of the core explanatory variables and passed the 1% statistical significance test. In terms of control variables, the estimated coefficient of recovered/1000 was −0.747, and the estimated coefficient of deaths/1000 was 6.014, both of which passed the 1% statistical significance test. By observing Table 5, the R-squared is 0.865, indicating that the SAR model can explain the spatial model well. In the estimation results of the SDM model, the estimated coefficient of confirmed/10,000 was 3.012, the estimated coefficient of recovered/1000 was −0.902, and the estimated coefficient of deaths/1000 was 5.888, all of which passed the 1% statistical significance test. The R-squared of the SDM model is 0.878, which is better than that of the SAR model. Comprehensive analysis shows that the SDM model is relatively better.
Based on the spatial weight matrix (W3), the estimation results of the SAR model and the SDM model are compared, and the positive and negative estimation coefficients of variables are basically consistent with the estimation coefficients of variables. In the estimation results of the SAR model, in terms of the core explanatory variables, the estimated coefficient of confirmed/10,000 is 1.217 and passes the 5% statistical significance test, indicating that the increase in confirmed number has a promoting effect on the infection rate. In terms of control variables, the estimated coefficient of recovered/1000 was −0.579 and passed the 1% statistical significance test. This suggests that an increase in the number of people recovered will inhibit the increase in infection rates. The estimated coefficient of deaths/1000 was 6.392 and passed the 1% statistical significance test. This shows that the increase in the number of deaths is conducive to the improvement of infection rate. By observing Table 5, the R-squared is 0.827, indicating that the SAR model can explain the spatial model well. In the estimation results of the SDM model, the estimated coefficient of the confirmed/10,000 was 2.760 and passed the 1% statistical significance test. The estimated coefficient of recovered/1000 was −0.933 and passed the 1% statistical significance test. The coefficient values of confirmed/10,000 and recovered/1000 were larger, and the significant level was increased. The estimated coefficient of deaths/1000 was 6.246 and passed the 1% statistical significance test. The coefficient of deaths/1000 decreased slightly but still passed the significance level test. It shows that these variables are indispensable factors for model correlation. The R-squared of the SDM model is 0.866, which is better than that of the SAR model. Comprehensive analysis shows that the SDM model is relatively better.
It can be seen from Table 5 that after using the inverse distance weight matrix (W2), the fitting degree between variables is significantly improved, and the level of visibility of each variable is improved. Considering that the propagation of COVID-19 is related to the distance between countries and whether it is adjacent, the SDM model based on the composite spatial weight matrix W3 is selected as the optimal estimation under the same level of visibility.
The specific analysis of the SDM model without fixed effect is as follows. The number of diagnoses was positive. In terms of increasing infection rate, the increase in the number of confirmed cases will play a positive role in the infection rate. Aggregate infections are more likely to occur before the infection is diagnosed. Therefore, the number of confirmed diagnoses is positively correlated with infection rate. The number of recoveries is negative. In terms of the increase in infection rate, the increase in the number of people recovering reduced the infection rate of COVID-19. The increase in the number of recoveries indicates that antibody formation in infected patients can resist viruses, and medical research has achieved further results. Therefore, the number of recoveries is negatively correlated with the infection rate. The number of deaths is positive. In terms of increased infection rates, the increase in the number of deaths indicates a higher incidence of COVID-19 infections. When more and more infected people die, it may indicate the collapse of the medical system, the increase of the infection rate of medical staff, the failure to deal with the corpses, and the increase of the source of infection. Therefore, the number of deaths is positively correlated with the infection rate.

Discussions
In terms of epidemic prevention, different countries, geographical locations, and population numbers are different. In order to study the spatial transmission of COVID-19 among these countries, we use the data after unified dimension to estimate the spatial model. In this paper, the daily number of people infected with COVID-19 is selected as the dependent variable. Based on the analysis of the model and data, we analyze that the number of people diagnosed, the number of people recovered, and the number of people who died on the same day are closely related to the daily infection rate of COVID-19, so they are explanatory variables. Before the estimation of the econometric model, we first preprocess these data of the COVID-19 to make these data in the same order of magnitude.
Taking into account the incubation period of COVID-19, the 14-day increase in infection rates is divided by 14, the total number of countries (billions); the number of diagnosed cases per day by 10,000; the number of recovered cases by 1000; and the number of deaths by 1000. Therefore, we consider using the number of confirmed cases divided by 10,000, the number of recovered cases divided by 1000, and the number of deaths divided by 1000 to predict the infection rate divided by 10 on the day.
In the change of Moran index in 21 days, we found that the Moran index of binary spatial weight matrix (W1) based on spatial adjacency was between 0.38 and 0.56, indicating that the spread of COVID-19 among the 14 countries has spatial correlation. Considering the actual situation, the spread of COVID-19 is also related to factors such as the distance between countries. Therefore, we introduce the inverse distance spatial weight matrix (W2), and the calculation results show that the spread of COVID-19 among 14 countries has a strong correlation, which is consistent with the actual situation. Therefore, we finally choose the composite spatial weight matrix (W3), which considers two factors. The results show that the transmission of COVID-19 between 14 countries is indeed related to distance and adjacency. From the Moran index, the Moran index of binary spatial weight matrix (W1) is relatively low, indicating that the selection of composite spatial weight matrix (W3) as the spatial weight matrix is more effective. That is, countries with high infection rates remain high in their neighboring countries, while countries with low infection rates remain low in their neighboring countries.
From the LM test results, it is better to select the spatial lag model based on statistical analysis. Because the empirical analysis of the model adopts the spatial lag model, it is best to expand to the spatial Durbin model. Therefore, SAR model and SDM model are analyzed based on three different spatial weight matrices. The regression results of different spatial matrices are evaluated. We find that the spatial model based on W3 has a better overall effect. Under the same spatial weight matrix, the spatial lag model of fixed effect and the spatial Durbin model of fixed effect are made, respectively. Table 5 shows that, based on the spatial weight matrix W1, the SAR model and SDM model can explain the correlation between each country. Although this correlation is weak, it is still an indispensable factor for us to consider the spatial correlation of COVID-19 in various countries. Based on the spatial weight matrix (W2), SAR model, and SDM model, the infection rates among the 14 countries have significant positive spatial spillover effects. Based on the composite spatial weight matrix (W3), SAR model, and SDM model, there is a strong correlation between infection rates in 14 countries. That is, the increase of infection rate in neighboring countries will promote the increase of the infection rate in neighboring countries, showing a significant positive correlation. Based on the same spatial weight matrix W3, the SAR model and SDM model are compared. From the parameter estimation results of the maximum likelihood estimation method, the economic meaning of the parameter estimation obtained by the spatial Durbin model is more consistent with the theory, and the R-squared is better.

Conclusions
In order to study the global transmission trend of COVID-19, the infection rate, the number of confirmed cases, the number of recovered cases, and the number of deaths in 14 representative countries in the early 21 days are selected for analysis and modeling. On the other hand, spatial variables are introduced into the model by using the knowledge of spatial measurement. Compared with the dynamic model named SAPHIRE proposed by Hao Xingjie et al. [42], the spatial econometric model introduced spatial variables to better explain the spatial effect of COVID-19 propagation.
In this paper, we propose a novel spatial weight matrix for deriving spatial dynamics, the distribution of COVID-19 infected persons, through the Moran index test. We compare the new weight matrix with classical spatial weight matrix by applying them to the COVID-19 infected person series in 14 typical countries around the world. We then apply the Lagrange multiplier test and spatial model to investigate how COVID-19 spread.
For Moran index test, our results indicate that concerning the COVID-19 patient series, there exists a spatial correlation between the 14 typical countries. For the Lagrange multiplier test, our results show that the COVID-19 patients can infect others with a lag. For the spatial model, our results give the relationship among the new COVID-19 patients, confirmed patients, rehabilitated patients, and death by COVID-19. In the following research, it is hoped that others can consider more variables on the basis of this research and choose a more suitable model for analysis.