The Inﬂuence of Potential Infection on the Relationship between Temperature and Conﬁrmed Cases of COVID-19 in China

: Considering the impact of the number of potential new coronavirus infections in each city, this paper explores the relationship between temperature and cumulative conﬁrmed cases of COVID-19 in mainland China through the non-parametric method. In this paper, the ﬂoating population of each city in Wuhan is taken as a proxy variable for the number of potential new coronavirus infections. Firstly, to use the non-parametric method correctly, the symmetric Gauss kernel and asymmetric Gamma kernel are applied to estimate the density of cumulative conﬁrmed cases of COVID-19 in China. The result conﬁrms that the Gamma kernel provides a more reasonable density estimation of bounded data than the Gauss kernel. Then, through the non-parametric method based on the Gamma kernel estimation, this paper ﬁnds a positive relationship between Wuhan’s mobile population and cumulative conﬁrmed cases, while the relationship between temperature and cumulative conﬁrmed cases is inconclusive in China when the impact of the number of potential new coronavirus infections in each city is considered. Compared with the weather, the potentially infected population plays a more critical role in spreading the virus. Therefore, the role of prevention and control measures is more important than weather factors. Even in summer, we should also pay attention to the prevention and control of the epidemic.


Introduction
A typical pneumonia caused by a new coronavirus, called COVID-19, broke out in Wuhan, Hubei Province, China on 31 December 2019. Cases have been spread to other cities in China and in foreign countries, which has turned into a pandemic outbreak. As of 13 May 2020, China had 84,458 cumulative confirmed cases, and 4,250,812 cases were diagnosed worldwide. The new coronavirus has caused a social shutdown, which has led to the decline of industries other than a few, such as the gaming and eSports industries, and online education [1][2][3]. What is more depressing is that the new coronavirus may accompany humans for a long time [4]. The epidemic situation of COVID-19 caused severe pressure on the long-term accumulation of global innovation, hindering the innovation ability of enterprises, which is an important factor for the sustainable development of the country [5][6][7]. Therefore, in order to better formulate the relevant epidemic prevention measures and recover the economic development of various countries as soon as possible, subjects such as the rate of an epidemic, transition methods, prevention methods, remaining time of the virus in the environment, and the effects of environmental factors on virus infection rate must be paid more attention to [8][9][10].
Previous studies have shown that meteorological variables can affect the transmission and survival of coronaviruses. For example, Pirouz et al. [8] used the data of Iran, Italy, Germany, Spain, and the United States to conclude that there is a certain negative correlation between daily average temperature and the prevalence of the coronavirus. Preliminary evidence from Bannister et al. [11] shows that in the global reported cases before 29 February 2020, the incidence rate of higher temperatures is associated with a lower incidence of COVID- 19. In addition, one result obtained by Mofijur et al. [12] in Dhaka, Bangladesh, was that the average temperature was significantly associated with new COVID-19 cases. Prata et al. [13] indicated that temperatures had a negative linear relationship with the number of confirmed cases in Brazil. Conversely, some studies have found no significant relationship between the temperature and the transmission of COVID-19 [14,15]. Hence, there is still no clear evidence of the negative correlation between environmental variables and transmission. One of the most important reasons for unclear conclusions could be that these previous studies did not consider the impact of the number of potential infections, which would be the most critical factor in the transmission of COVID-19. In addition, most previous studies used traditional models, which were too simple to deal with complex nonlinear models, and environmental factors often have unknown nonlinear effects [16,17]. Currently, a non-parametric approach, commonly known as the kernel method, is used to describe the association between variables. For example, Fan et al. [18] applied the kernel method to compare the PM2.5 density estimates between summer and winter, rush and non-rush hours, respectively. Unlike the parameter method, which can only obtain the mean information of variables, one important benefit of using the kernel method is that one can comprehensively analyze the relationship between variables by comparing the distribution of variables to better and more robustly describe the relationship between variables. Another benefit of using the kernel method is that we can visually explore the relationships between variables.
Although there is relevant literature on the relationship between temperature and confirmed cases in China [19,20], the literature does not consider the impact of the number of potential infections, which would be the most critical factor in the transmission of COVID-19. Hence, considering the impact of the number of potential new coronavirus infections in each city, this study aimed to explore the relationship between temperature and cumulative confirmed cases of COVID-19 in China with a kernel method to improve our knowledge of the spread of the virus. In this paper, the floating population of each city in Wuhan refers to the floating population in Wuhan that originates from other cities in China and is taken as the proxy variable for the number of potential new coronavirus infections. Since Wuhan was the first city to break out of a new coronavirus in China and the time of its outbreak is approaching China's Lunar New Year, Wuhan's floating population has been returning to their hometown for the Chinese New Year. Moreover, since the prevention measures of the Chinese government are very timely, after Wuhan was locked down on 23 January 2020, the population between cities in the whole country basically stopped flowing. Therefore, the floating population of Wuhan in each city can be regarded as a proxy variable of the potential infection number of the city. Specifically, based on the non-parametric method, this paper explores the relationship between temperature and cumulative confirmed cases of COVID-19 in mainland China. Firstly, since the nonparametric method needs to identify the density estimation of data correctly, we compared symmetric kernel and asymmetric gamma kernel density estimations of the cumulative confirmed cases and found that the asymmetric gamma kernel fits the data better. Next, by applying an asymmetric gamma kernel, this paper estimates the probability density of cumulative confirmed cases every 14 days from 24 January to 20 March 2020. The results show that measures such as Wuhan's lockdown and other cities' strict epidemics prevention are effective. Then, based on the Copula model, the multivariate density of temperature and cumulative confirmed cases are estimated. By comparing the benchmark multivariate density, the preliminary results show a certain correlation between the cumulative number of confirmed cases and the urban temperature in China. Finally, this paper compares the density estimations of cumulative confirmed cases of COVID-19 between the cities with low and high mobile population from Wuhan, and cities with low and high temperatures. The results show that Wuhan's mobile population is positively related to cumulative confirmed cases of COVID-19, while the relationship between temperature and the number of cumulative confirmed cases is inconclusive in mainland China when the impact of the number of potential new coronavirus infections in each city is considered.
The rest of the paper is organized as follows. Section 2 describes nonparametric density estimation methods. Section 3 presents the empirical results. Section 4 concludes.

Nonparametric Density Estimation
Let X 1 , X 2 , . . . , X n be a random sample from a probability distribution with an unknown probability density function f X (x).

Symmetric and Asymmetric Kernel Density Estimators
For any x ∈ χ, where χ is the unbounded support, the conventional or fixed bandwidth symmetric kernel estimator of unknown f X (x) is as follows: where K(·) is a kernel function and h is the bandwidth. A kernel function with a symmetric density satisfies the following: The Gauss kernel is the most used symmetric kernel, which is expressed as: Equation (1) is the consistent estimator of the true density function where h→ 0 and nh→ as n→ ∞ [21][22][23].
When the unknown density has support on [0, ∞], the gamma kernel estimator, one of the asymmetric kernel density estimations, is given by the following: where: Remark. Standard fixed bandwidth symmetric kernel-type density estimators are known to encounter boundary problems for positive random variables with a large probability mass close to zero. It is shown that, in such settings, alternatives to asymmetric gamma kernel estimators are superior for the following reasons [24,25]: The gamma kernel estimator is non-negative and free of boundary bias; The shape of the gamma kernel function changes with the position of sample points, and then the smoothness of each estimation point is adjusted naturally.
Moreover, the farther the points of estimation move away from the boundary, the more the estimator's variance decreases. It leads to an advantage in situations of naturally unbalanced scattered design points, in particular for densities with sparse areas.

Selection of Bandwidth
This paper calculates the bandwidth h of the symmetric kernel density estimation of Equation (1) based on the least squares cross-validation method [18], which is given by: The bandwidth h of gamma density estimation is obtained by minimizing the following:

Semiparametric Multivariate Density Estimation
For the joint probability density kernel estimation of the temperature and the number of cumulative confirmed cases, the semi-parametric multivariate density estimation for positive data is used here [26]. Let Y 1 , Y 2 , · · · , Y n be a random sample from a probability distribution with an unknown probability density function f Y (y).
From Sklar (1959), it is well-known that the distribution function of a vector (x, y) can be expressed via a copula [27][28][29][30][31]: Derivate the two sides of (5), we can obtain: The Gumbel-Hougaard copula is used here: The estimate of (6) is given by the following estimation steps [23]: (2) The distribution function of X and Y are estimated by the empirical distribution.

Empirical Findings
Daily data of cumulative confirmed cases of COVID-19 in mainland Chinese cities is gathered from the website (https://lab.isaaclin.cn/nCoV/, accessed on 5 January 2020) from 23 January to 20 March 2020. X i represents the cumulative confirmed cases in the i-th city.

Density Estimation of Cumulative Confirmed Cases
The Gauss kernel and gamma kernel density of COVID-19's cumulative confirmed cases in mainland China on 20 March 2020 are firstly investigated. As shown in Figure 1, the density curve of the Gauss kernel is rather wiggly up-and-down, and it is an unexcepted density curve in reality. Meanwhile, the Gauss kernel leads to an underestimated estimation because it gives weight to the negative value, while the cumulative confirmed cases are While the gamma kernel density estimator is non-negative and can ch position of sample points, the smoothness of each estimation point is adju From Figure 1, the gamma density estimators show a large probability mas Meanwhile, we can observe that the condition of the shoulder is satisfie means that the gamma kernel estimator is boundary-problem-free [25,31].
To sum up, the gamma kernel can generate a positive and a reasonabl sity, while a free boundary problem for the cumulative confirmed cases. Th following analyses proceed with the gamma kernel density estimation.
The gamma kernel estimations of the cumulative confirmed cases fo interval since 24 January 2020 are displayed in Figure 2. The result shows th curve of the cumulative confirmed cases on 7 February 2020 is higher than uary 2020, on the support [11, +∞). That is, the amount of cumulative co increases quickly in the first 14-day interval. Then, it is gradually reduced in several 14-day intervals by taking severe measures such as Wuhan's lockd ments at all levels investigate the suspected cases and their contact populat the population from the "epidemic area", Hubei, villages, and communit lockdown. That is, the measures are sufficient to control the spread of COV  While the gamma kernel density estimator is non-negative and can change with the position of sample points, the smoothness of each estimation point is adjusted naturally. From Figure 1, the gamma density estimators show a large probability mass close to zero. Meanwhile, we can observe that the condition of the shoulder is satisfied here, which means that the gamma kernel estimator is boundary-problem-free [25,31].
To sum up, the gamma kernel can generate a positive and a reasonably smooth density, while a free boundary problem for the cumulative confirmed cases. Therefore, all the following analyses proceed with the gamma kernel density estimation.
The gamma kernel estimations of the cumulative confirmed cases for every 14-day interval since 24 January 2020 are displayed in Figure 2. The result shows that the density curve of the cumulative confirmed cases on 7 February 2020 is higher than that on 24 January 2020, on the support [11, +∞). That is, the amount of cumulative confirmed cases increases quickly in the first 14-day interval. Then, it is gradually reduced in the following several 14-day intervals by taking severe measures such as Wuhan's lockdown. Governments at all levels investigate the suspected cases and their contact population, especially the population from the "epidemic area", Hubei, villages, and communities that are in lockdown. That is, the measures are sufficient to control the spread of COVID-19. While the gamma kernel density estimator is non-negative and can ch position of sample points, the smoothness of each estimation point is adju From Figure 1, the gamma density estimators show a large probability mas Meanwhile, we can observe that the condition of the shoulder is satisfie means that the gamma kernel estimator is boundary-problem-free [25,31].
To sum up, the gamma kernel can generate a positive and a reasonabl sity, while a free boundary problem for the cumulative confirmed cases. Th following analyses proceed with the gamma kernel density estimation.
The gamma kernel estimations of the cumulative confirmed cases fo interval since 24 January 2020 are displayed in Figure 2. The result shows th curve of the cumulative confirmed cases on 7 February 2020 is higher than uary 2020, on the support [11, +∞). That is, the amount of cumulative co increases quickly in the first 14-day interval. Then, it is gradually reduced in several 14-day intervals by taking severe measures such as Wuhan's lockd ments at all levels investigate the suspected cases and their contact populat the population from the "epidemic area", Hubei, villages, and communit lockdown. That is, the measures are sufficient to control the spread of COV

Relationship of Wuhan's Mobile Population and Cumulative Confirmed Case
With the coming of the Spring Festival (25 January 2020), many migra turn to their hometowns from Wuhan [32]. The mobile population is a po

Relationship of Wuhan's Mobile Population and Cumulative Confirmed Cases
With the coming of the Spring Festival (25 January 2020), many migrant workers return to their hometowns from Wuhan [32]. The mobile population is a potential carrier for virus transmission. Hence, this paper explores the correlation between Wuhan's migrants and the cumulative confirmed cases by comparing the density between cities with different scales of the mobile population from Wuhan. According to the standard defined by Fan et al. [32], we divide the sample into two sub-groups, the cities with a large mobile population from Wuhan (LMG) and the cities with a small mobile population from Wuhan (SMG).
As the cumulative confirmed cases in mainland China are relatively stable, we only investigated the density curve of cumulative confirmed cases on 20 March 2020. The result is displayed in Figure 3. It shows that the density of LMG is significantly higher than that of SMG over the support [20, +∞), which indicates that Wuhan's mobile population was positively correlated with the cumulative confirmed cases of COVID-19. That is, it is necessary for other regions to take on 14-days-isolation-and-observation measures on personnel from Hubei, especially from Wuhan. ustainability 2021, 13, x FOR PEER REVIEW As the cumulative confirmed cases in mainland China are relatively s investigated the density curve of cumulative confirmed cases on 20 March 2 is displayed in Figure 3. It shows that the density of LMG is significantly h of SMG over the support [20, +∞), which indicates that Wuhan's mobile p positively correlated with the cumulative confirmed cases of COVID-19. Th essary for other regions to take on 14-days-isolation-and-observation me sonnel from Hubei, especially from Wuhan.

Preliminary Relationship of Temperature and Cumulative Confirmed Cases B Multivariate Density
Temperature, one of the critical environmental factors, is a non-negli fluencing the coronavirus's behavior [8]. We need to estimate the probabi city-level temperatures accurately.
As shown in Figure 4, there is a north-south difference in city-level a in China. If the probability density of temperature in Chinese cities is estim rameter model, this difference cannot be reflected. Moreover, the tempera garded as "unbounded" data, so it is suitable to use the traditional symm estimate its probability density ( Figure 5).

Preliminary Relationship of Temperature and Cumulative Confirmed Cases Based on Multivariate Density
Temperature, one of the critical environmental factors, is a non-negligible factor influencing the coronavirus's behavior [8]. We need to estimate the probability density of city-level temperatures accurately.
As shown in Figure 4, there is a north-south difference in city-level air temperature in China. If the probability density of temperature in Chinese cities is estimated by a parameter model, this difference cannot be reflected. Moreover, the temperature can be regarded as "unbounded" data, so it is suitable to use the traditional symmetric kernel to estimate its probability density ( Figure 5). As the cumulative confirmed cases in mainland China are relatively stable, we only investigated the density curve of cumulative confirmed cases on 20 March 2020. The resul is displayed in Figure 3. It shows that the density of LMG is significantly higher than tha of SMG over the support [20, +∞), which indicates that Wuhan's mobile population wa positively correlated with the cumulative confirmed cases of COVID-19. That is, it is nec essary for other regions to take on 14-days-isolation-and-observation measures on per sonnel from Hubei, especially from Wuhan.

Preliminary Relationship of Temperature and Cumulative Confirmed Cases Based on Multivariate Density
Temperature, one of the critical environmental factors, is a non-negligible factor in fluencing the coronavirus's behavior [8]. We need to estimate the probability density o city-level temperatures accurately.
As shown in Figure 4, there is a north-south difference in city-level air temperatur in China. If the probability density of temperature in Chinese cities is estimated by a pa rameter model, this difference cannot be reflected. Moreover, the temperature can be re garded as "unbounded" data, so it is suitable to use the traditional symmetric kernel to estimate its probability density ( Figure 5).    Based on formula (6), we estimate the multivariate density of temperature and the cumulative number of confirmed cases. The kernel density of temperature is estimated by Gaussian kernel density, and the kernel density of cumulative confirmed cases is estimated by the Gamma kernel density. The benchmark multivariate kernel density is estimated assuming f(x,y) = f(x)f(y) means that X is independent of Y. If the multivariate density of temperature and cumulative confirmed cases is different from its benchmark, then we initially consider that these two factors are related.
The contrast between Figure 6a,c and Figure 6b,d shows that the graph of Figure 6b,d is symmetrical, whereas Figure 6a,d has a clockwise shape. Therefore, we initially believed that there was a certain relationship between the temperature and the cumulative confirmed cases. Based on formula (6), we estimate the multivariate density of temperature and the cumulative number of confirmed cases. The kernel density of temperature is estimated by Gaussian kernel density, and the kernel density of cumulative confirmed cases is estimated by the Gamma kernel density. The benchmark multivariate kernel density is estimated assuming f(x,y) = f(x)f(y) means that is independent of . If the multivariate density of temperature and cumulative confirmed cases is different from its benchmark, then we initially consider that these two factors are related.
The contrast between Figure 6a,c and Figure 6b,d shows that the graph of Figure 6b,d is symmetrical, whereas Figure 6a,d has a clockwise shape. Therefore, we initially believed that there was a certain relationship between the temperature and the cumulative confirmed cases.

Relationship of Temperature and Cumulative Confirmed Cases
To investigate the relationship between the temperature and the amount of COVID-19's cumulative confirmed cases in China, we divided the sample into two sub-groups according to the temperature. The city is classified into the high-temperature group (HTG) if its temperature is equal or higher than 6 • C; otherwise, it is classified into the lowtemperature group (LTG). The kernel curve (seen in Figure 5) of daily average temperatures in January and February 2020 of 179 cities in Mainland China has two kurtoses, where T = 6 • C is the middle of the two kurtoses. Hence, we take it as the standard.
The kernel densities of the two sub-groups are shown in Figure 7. The results show that the density of the cumulative confirmed cases of HTG is relatively higher than that of LTG in the support [15, +∞]-(45,70). It indicates a positive relationship between the temperature and cumulative confirmed cases, which is contrary to the results of Pirouz et al. [8]. A possible explanation is that cities with higher temperatures are almost all coastal cities, and most of these cities are economically developed regions, where they have higher population mobility.

Relationship of Temperature and Cumulative Confirmed Cases
To investigate the relationship between the temperature and the amou 19's cumulative confirmed cases in China, we divided the sample into tw according to the temperature. The city is classified into the high-temperature if its temperature is equal or higher than 6 °C; otherwise, it is classified into perature group (LTG). The kernel curve (seen in Figure 5) of daily average in January and February 2020 of 179 cities in Mainland China has two kur = 6 °C is the middle of the two kurtoses. Hence, we take it as the standard.
The kernel densities of the two sub-groups are shown in Figure 7. Th that the density of the cumulative confirmed cases of HTG is relatively hi of LTG in the support [15, +∞]-(45,70). It indicates a positive relationshi temperature and cumulative confirmed cases, which is contrary to the resu al. [8]. A possible explanation is that cities with higher temperatures are alm cities, and most of these cities are economically developed regions, wh higher population mobility. To prove the assertion, the relationship between temperature and cu firmed cases is investigated by controlling population mobility. Figures 8 the kernel densities of HTG and LTG within LMG and SMG, respectively. the negative correlation of temperature and cumulative confirmed cases ex port (22,78), while it turns out to be positively correlated in the support (7 SMG, a weak positive correlation between temperature and cumulative co exists. Hence, the relationship between temperature and cumulative conf inconclusive. Other factors such as population density may be ignored, wh further studied in future research. To prove the assertion, the relationship between temperature and cumulative confirmed cases is investigated by controlling population mobility. Figures 8 and 9 display the kernel densities of HTG and LTG within LMG and SMG, respectively. Within LMG, the negative correlation of temperature and cumulative confirmed cases exists in the support (22,78), while it turns out to be positively correlated in the support (78, +∞). Within SMG, a weak positive correlation between temperature and cumulative confirmed cases exists. Hence, the relationship between temperature and cumulative confirmed cases is inconclusive. Other factors such as population density may be ignored, which should be further studied in future research.

Relationship of Temperature and Cumulative Confirmed Cases
To investigate the relationship between the temperature and the amount of CO 19's cumulative confirmed cases in China, we divided the sample into two sub-gr according to the temperature. The city is classified into the high-temperature group (H if its temperature is equal or higher than 6 °C; otherwise, it is classified into the lowperature group (LTG). The kernel curve (seen in Figure 5) of daily average temperat in January and February 2020 of 179 cities in Mainland China has two kurtoses, whe = 6 °C is the middle of the two kurtoses. Hence, we take it as the standard.
The kernel densities of the two sub-groups are shown in Figure 7. The results s that the density of the cumulative confirmed cases of HTG is relatively higher than of LTG in the support [15, +∞]-(45,70). It indicates a positive relationship between temperature and cumulative confirmed cases, which is contrary to the results of Piro al. [8]. A possible explanation is that cities with higher temperatures are almost all co cities, and most of these cities are economically developed regions, where they higher population mobility. To prove the assertion, the relationship between temperature and cumulative firmed cases is investigated by controlling population mobility. Figures 8 and 9 dis the kernel densities of HTG and LTG within LMG and SMG, respectively. Within L the negative correlation of temperature and cumulative confirmed cases exists in the port (22,78), while it turns out to be positively correlated in the support (78, +∞). W SMG, a weak positive correlation between temperature and cumulative confirmed c exists. Hence, the relationship between temperature and cumulative confirmed cas inconclusive. Other factors such as population density may be ignored, which shou further studied in future research.

Conclusions
Considering the impact of the number of potential new coronavirus inf city, this paper explores the relationship between temperature and cumula cases of COVID-19 in mainland China through the non-parametric method the floating population of each city in Wuhan is taken as a proxy variable f of potentially infected people. In order to use the non-parametric method paper applies a symmetric kernel and an asymmetric gamma kernel to esti ability density of cumulative confirmed cases of COVID-19 in mainland Chi show that the asymmetric Gamma kernel provides a more reasonable fit f cumulative confirmed cases.
By comparing the densities of COVID-19's cumulative confirmed LMG and SMG, and HTG and LTG, we find that Wuhan's mobile populatio related to cumulative confirmed cases. Moreover, the preliminary result s correlation between the cumulative number of confirmed cases and the u ture in China based on the Copulas method. However, the relationship be ature and cumulative confirmed cases is inconclusive when the impact of potential new coronavirus infections in each city is considered. Compared perature, the potentially infected population plays a more important role in virus. Therefore, the role of prevention and control measures is more i weather factors. Even in summer, we should also pay attention to the preve trol of the epidemic.
Our results do not show a wide range of temperatures and their effect expand the data, for example, into the summer period, we cannot reasona proxy variables for the number of potential infections. Our next study goal how to find a proxy variable for the number of potentially infected people setting or in a high-temperature metropolis.

Conclusions
Considering the impact of the number of potential new coronavirus infections in each city, this paper explores the relationship between temperature and cumulative confirmed cases of COVID-19 in mainland China through the non-parametric method. In this paper, the floating population of each city in Wuhan is taken as a proxy variable for the number of potentially infected people. In order to use the non-parametric method correctly, this paper applies a symmetric kernel and an asymmetric gamma kernel to estimate the probability density of cumulative confirmed cases of COVID-19 in mainland China. The results show that the asymmetric Gamma kernel provides a more reasonable fit for COVID-19's cumulative confirmed cases.
By comparing the densities of COVID-19's cumulative confirmed cases between LMG and SMG, and HTG and LTG, we find that Wuhan's mobile population is positively related to cumulative confirmed cases. Moreover, the preliminary result shows a certain correlation between the cumulative number of confirmed cases and the urban temperature in China based on the Copulas method. However, the relationship between temperature and cumulative confirmed cases is inconclusive when the impact of the number of potential new coronavirus infections in each city is considered. Compared with the temperature, the potentially infected population plays a more important role in spreading the virus. Therefore, the role of prevention and control measures is more important than weather factors. Even in summer, we should also pay attention to the prevention and control of the epidemic.
Our results do not show a wide range of temperatures and their effects because if we expand the data, for example, into the summer period, we cannot reasonably obtain the proxy variables for the number of potential infections. Our next study goal is to figure out how to find a proxy variable for the number of potentially infected people in the general setting or in a high-temperature metropolis.

Conflicts of Interest:
The authors declare no conflict of interest.