Discovering Correlations between the COVID-19 Epidemic Spread and Climate

The outbreak of Corona Virus Disease 2019 (COVID-19) has affected the lives of people all over the world. It is particularly urgent and important to analyze the epidemic spreading law and support the implementation of epidemic prevention measures. It is found that there is a moderate to high correlations between the number of newly diagnosed cases per day and temperature and relative humidity in countries with more than 10,000 confirmed cases worldwide. In this paper, the correlation between temperature/relative humidity and the number of newly diagnosed cases is obvious. Governments can adjust the epidemic prevention measures according to climate change, which will more effectively control the spread of COVID-19.


Introduction
Corona Virus Disease 2019 (COVID-19) is a global pandemic and serious threat to human health which halt the economic activities [1]. The COVID-19 outbreak has become a global public health emergency [2]. Corona Virus Disease 2019 (COVID-19) is pneumonia caused by SARS-CoV-2 infection. Since December 2019, there have been many cases of pneumonia caused by this virus infection all over the world. The COVID-19 pandemic forced many countries to implement full or partial lockdown, causing a substantial reduction in the anthropogenic activities due to being prohibited from outdoor invasion which resulted in less transportation and shutting down of industries [3]. On 11 March 2020, the World Health Organization (WHO) declared COVID-19 a world pandemic [4]. To date, COVID-19 infection has been reported worldwide and may be further transmitted by air travel [5]. Currently, many scholars are conducting research on the spatial spread of COVID-19. For example, Jelodar used LSTM Recurrent Neural Network (RNN) to carry out deep emotional classification of COVID-19 [6]; Alsaeedy used the existing cellular wireless network function to detect COVID-19 risk areas [7].
According to the data on "Baidu COVID-19 Epidemic Real-time Big Data Report", from 15 February to 22 June, the total number of confirmed cases of the COVID-19 in the United States exceeded 2.31 million, and the cumulative number of deaths exceeded 120,000, reaching 123,053. The cumulative number of confirmed cases of COVID-19 in Brazil exceeded 1.1 million, and the cumulative number of deaths exceeded 50,000, reaching 51,271. As of 22 June, three countries in the southern hemisphere were among the top five in the daily list of newly diagnosed cases. On 22 June, there were 26,616 new cases in Brazil. Therefore, there is an urgent need to evaluate the correlation between the spread of COVID-19 and climate conditions, so as to better support the decision making and response measures and control the further spread of COVID-19.

Research on Epidemic Situation and Climate
Since 2000, Severe Acute Respiratory Syndrome (SARS), bird flu and other epidemics have broken out successively. The SARS epidemic broke out in Guangdong, China in mid-November 2002, and then spread to other parts of China and the world. Under the action of a series of countermeasures, SARS disappeared in mid-June 2003. SARS lasted for about 7 months, and reached its peak from April to May 2003. Bird flu epidemics mostly occur in winter and spring, and they all end in late spring to early summer. SARS was first discovered in Guangdong and Hong Kong, China in 2003 with the warm climate, where the average temperature from January to February is above 10°C. The bird flu was first discovered in Guangxi, Hubei and other provinces in 2004 with a suitable climate, while the average temperature from January to February is around 10°C. The COVID-19 epidemic was first reported in Wuhan, Hubei Province, while the average temperature from January to February was also around 10°C [8].
In China, the epidemics such as SARS and bird flu have the common characteristics of beginning in winter, ending in summer, and originating in southern China. These viruses have high activity and transmission capacity in low-temperature and high-humidity environment [9]. Some reference believe that the increase temperature will the virus lose its infective activity.
At present, there have been some studies on the correlation between the COVID-19 epidemic spread and climate. Zhu et al. [10] confirmed the highly significant correlation between absolute humidity and daily new COVID-19 cases using Multiple Linear Regression Model, after collecting the daily number of new cases and corresponding climate data from eight regions in four countries in South America. David et al. [11] utilized the generalized additive model (GAM) to explore the linear and non-linear relationship between the annual average temperature compensation and the confirmed COVID-19 cases in the capital city of Brazil. It was found that the daily cumulative number of confirmed cases decreased by 4.8951% when the temperature increased by 1°C. Goswami [12] used Sen's Slope and Man-Kendall test and generalized additive regression model (GAM) to detect the impact of daily temperature and relative humidity on incidence rate in India countries. Lowen [13], Barreca [14] andŻuk [15] pointed out that environmental temperature plays an important role in the survival and transmission of viruses.
A large number of studies reveal that temperature and humidity can affect the spread of epidemics, thus prompting this study to further explore the global impact of environmental factors on COVID-19.
At present, all the current literature selected samples are limited to local areas, which may lead to the problem that the conclusions are not universal. Therefore, this study collected the epidemic situation and meteorological data in the high incidence area of global epidemic situation, analyzed the development trend of the epidemic situation on a global scale, including more climatic conditions that may affect the spread of the virus, so as to study the climatic factors affecting the activity of SARS-CoV-2.

Multiple Regression Analysis
Multiple Linear Regression Models are suitable for scenarios where multiple variables affect single outcome. It can accurately measure the correlation degree and regression fitting degree of each variable, and improve the prediction model effect. In this study, climate factors have an impact on the spread of the epidemic in many aspects. As it was necessary to evaluate the correlation between various climatic factors and the spread of SARA-CoV-2. Multiple Linear Regression Models was selected for analysis.
Multiple Regression Analysis Models have been widely used in various scenarios of COVID-19. Rath [16] used Multiple Linear Regression Model to predict that the number of daily active cases in India would reach 52,290 by 15 August. Ayyoubzadeh et al. [17] used Multiple Linear Regression Method to predict the spread of COVID-19 in Iran, and found that in addition to the incidence of the previous day, factors that can effectively improve the accuracy of the prediction also include hand sanitizer usage and hand washing frequency. Kass et al. [18] analyzed the relationship between Body Mass Index (BMI) and age in patients diagnosed with COVID-19 through Multiple Linear Regression Model, and concluded that obesity may increase the infection rate of COVID-19. Yang et al. [19] estimated the early mortality of COVID-19 by linear regression model, and concluded that the mortality of COVID-19 was lower than that of coronavirus epidemic caused by SARS-CoV and MERS-CoV. Xiong et al. [20] analyzed the correlation between initial CT features and turbidity progression in COVID-19 patients by linear regression models and Spearman correlation coefficient.

Data Sources
The open data sets of confirmed cases published by the Center for System Science and Engineering (CSSE) of Johns Hopkins University were used in this study. The data from all over the world were collated from 22 January, the early stage of the epidemic. Considering that countries with less confirmed epidemic cases might have problems with less obvious climate characteristics, and it is necessary to avoid the problem that too few objects lead to the research results not being universal, 65 countries with more than 10,000 confirmed cases from 22 March to 22 June were selected. The total number of confirmed cases per day is subtracted from that of the previous day to get the new daily number of confirmed cases in each country, so as to reflect the epidemic transmission capacity.
The climate data in this study comes from the daily records of weather stations around the world collected by China Meteorological Data Network (http://data.cma.cn/). We selected high average monthly temperature, low average monthly temperature, sea level pressure, altitude, wind speed, rainfall, dew point temperature and relative humidity as the climatic indicators of each region from 22 March to 22 June during the epidemic periods in each region to reflect the regional weather changes.
In this experiment, 65 countries with more than 10,000 confirmed COVID-19 cases at 24:00 on 22 June were selected as experimental subjects. The number of newly diagnosed cases per day and 8 climate factors were collected for experimental analysis. Set the number of new daily confirmed cases (New) as the dependent variable y, the monthly average maximum temperature during the epidemic periods (Tmax), the monthly average minimum temperature during the epidemic periods (Tmin), sea level pressure (Sea_Pressure), Wind_Speed, Elevation, Rainfall, Dew point temperature (DP), and Relative humidity (Humidity) are respectively Arguments x1, x2, x3, x4, x5, x6, x7, x8 . The samples of observation data are shown in Table 1.

Methodology
We chose the Multiple Linear Regression Analysis Method to analyze the correlation between the number of daily increased confirmed cases in each region and the climate indicators of the region. Firstly, the relevant Multiple Linear Regression Method was used to perform a series of verifications and establish a multiple regression equation. Then the Pearson correlation coefficient was used to evaluate the relative importance of the influence of each independent variable on the dependent variable, i.e., the correlation coefficient between each independent variable and the dependent variable. The linear relationship between them was discovered and the correlation between the observed variables was determined. The advantage of using this method is that the relationship between the variables can be clearly defined and expressed quantitatively, and the influence of different climatic factors on the number of new diagnosed cases per day can be clearly shown.
According to the selected observation variable data, the Multiple Linear Regression Models as shown in Equation (1) can be constructed.
where y is the dependent variable, which represents the number of newly increased confirmed cases every day in this experiment; x 1 , x 2 , x 3 , x 4 , x 5 , x 6 , x 7 , x 8 are independent variables, respectively representing the monthly average maximum temperature, monthly average minimum temperature, sea level pressure, wind speed, elevation, rainfall, dew point temperature and relative humidity in this experiment. β 0 , β 1 , β 2 , β 3 , β 4 , β 5 , β 6 , β 7 , β 8 is the unknown parameters of the corresponding independent variables; ε is called the error term, which is an unobservable random variable with a mean value of zero and a variance of σ 2 > 0, and ε ∈ N(0, σ 2 ). The above Linear Regression Model can be used to predict the number of new daily confirmed cases and determine the correlation between each independent variable and the dependent variable. Therefore, for different dates, n groups of different data can be obtained, as shown in Equation (2).
where ε 1 , ε 2 , · · · , ε n is independent of each other and ε ∈ N(0, σ 2 ). The Pearson correlation coefficient is calculated on the basis of the above data. For a single independent variable, the calculation method is shown in Equation (3).
R (Pearson correlation coefficient) is used to describe the correlation between two groups of different data. When the development trend of two different groups of data presents weak correlation, 0 ≤ |R| < 0.3. When the development trend of two different groups of data show medium correlation, 0.3 ≤ |R| < 0.6. And When the development trendof two different groups of data presents high correlation, 0.6 ≤ |R| < 1.0.
Pearson correlation coefficient was utilized to analyze the correlation between variables, and the correlation coefficient values are shown in Table 2.
In Table 2, R is the correlation coefficient, R 01 is the correlation coefficient between the monthly average maximum temperature (Tmax) and the number of the daily increased confirmed cases (New), and R 10 is the correlation coefficient between the daily increased confirmed cases (New) and the monthly average maximum temperature (Tmax). Because the correlation between the monthly average maximum temperature (Tmax) and the number of the daily increased confirmed cases (New) is equivalent to the correlation between the number of the daily increased confirmed cases (New) and the monthly average maximum temperature (Tmax). Therefore, in the above table, R 01 = R 10 , R 12 = R 21 , and so on. The closer the absolute value of the correlation coefficient to 1, the stronger the correlation between the two observed variables in the region. New Tmax Tmin Sea_Pressure Wind_Speed Elevation Rainfull DP Humidity

Model Testing
After the establishment of Multiple Linear Regression Model, it is necessary to test the Linear Regression Model. In this experiment, the modified determination coefficient R 2 is selected to determine the performance of the model.The formula for selecting the modified determination coefficient is shown in Equation (4).
where SSE represents the Sum of Squares of Residuals, and SST represents the Sum of Squares of Deviations, while n−k−1 represents the degree of freedom of the Sum of Squares of Residuals, and n−1 is the degree of freedom of the Sum of the Squares of Deviation. By dividing the SSE by their degrees of freedom and dividing the SST by their degrees of freedom, the influence of the number of variables on the coefficient of determination can be suppressed, and the fitting degree of the linear regression model for the relationship between variables can be better reflected. The closer the correction coefficient is to 1, the higher the fitting degree of the relationship between variables is, and the more accurate the model effect is.
The difference between the modified determination coefficient R 2 and the coefficient of determination R 2 is that a penalty term is introduced. The function of the penalty term is that our coefficient of determination can be increased only when variables that are really helpful for analysis are introduced, effectively solving the problem of As the number of independent variables in the model increases, the coefficient of determination also gradually increases.
In order to ensure that there are enough data for model training, 70% of the observation data is selected as the training set, and the remaining 30% of the observation data are used as the test set. The climate data of the day in the test set is input into the establish multiple linear regression model, and the predicted value of the number of new confirmed cases on that day is obtained, and compare with the actual observation value of the day. This is to test the constructed Multiple Linear Regression Model is in line with the actual situation.

Regression Analysis between the Number of New Daily Confirmed Cases and Climate Variables
After data processing with Multiple Linear Regression Models, the fitting function between the number of new daily confirmed cases and each climate variable is obtained. Among the 65 linear regression models (each model for a country), the correlation coefficient of 42 models were greater than 0.5, showing that these models have good fitting effect. Countries with the top six number of confirmed cases as of 22 June were selected for display. Tables 3 and 4 shows the Multiple Linear Regression Model constructed by the training set of data from United States, Brazil, India, Mexico, South Africa and Peru.

The Correlation between the New Confirmed Case Number and Climate in Different Countries
We explore the relationship between the daily number of new confirmed cases (New) and climate data in different countries, and select the countries with the top six number of confirmed epidemic cases as of 24:00 on 22 June for demonstration. The correlation coefficient between the daily number of the confirmed cases and the climate data is shown in Table 5. As can be seen in Table 5, the correlation coefficient between New of countries and each climatic variable illustrates significant correlations between New and Tmax, Tmin, Humidity.

The Correlation between the Number of New Daily Confirmed Cases and Various Climate Variables in Different Countries
In the 65 countries selected in this experiment, the experimental results of the correlation coefficient between the New and Various Climate Parameters are shown in Figure 2 Figure 4h.
Based on the above analysis, the following further inferences can be drawn: 1. and Tmin belong to air temperature,and the Dew Point Temperature can be obtained from the Relative Humidity and Temperature [21]. It is inferred that the activity of the COVID-19 is mainly related to temperature and humidity. It is worth noting that the temperature and humidity compared with the correlation between the number of new daily confirmed cases, more countries show medium or high correlations between temperature and the number of new daily confirmed cases. The temperature seems to have a more obvious impact on virus activity. However, the correlation between the number of new daily confirmed cases and humidity should not be ignored. It is necessary to consider the impact of climate factors on the spread of the epidemic in combination with temperature and humidity.

Geospatial Analysis of the Correlation between the New and Climate Variables
Figures 3-6 show the maps of the correlation between New and Tmax, Tmin, Relative Humidity, verifying and demonstrating the analysis and inference above on the whole. In the maps, red color indicates high correlation, yellow color indicates medium correlation, and blue-green color means low correlation, while white color means unselected countries. We select most countries in the world for analysis, trying to reveal the impact of various climate factors on the activity of COVID-19 in the global range, and effectively reduce the risk of erroneous conclusions because the occasional weather conditions in individual countries are similar to the climatic conditions related to virus activity, so that the experimental results are more reliable and universal. Integrating with the spatial visualization analysis in Figures 3-6, we further analyze and infer the following points of view: 1. The intervention of epidemic prevention and control measures can restrain the influence of climatic factors on the spread of the epidemic. China, where the epidemic broke out in January, shows a low correlation between the New and various climate factors. In the analysis of Israel, South Korea, and Singapore, which had good epidemic prevention and control, the correlation analysis between the New and various climate factors also shows a low correlation. Therefore, this article speculates that government interventions in epidemic prevention and control measures can effectively reduce the spread of the epidemic and restrain the impact of climate on the spread of the epidemic.

Discussion
According to the research results, temperature and humidity have a high correlation with the activity of SARS-CoV-2. Other literature shows that in addition to temperature and humidity, biological gender [22], obesity rate [23], age [24], comorbidities [25] can affect the transmission of COVID-19. Therefore, the government should pay special attention to some special environment when formulating measures to prevent and control the epidemic situation, which can make SARS-CoV-2 have high activity. Countries should strengthen the control of disinfection and social isolation on the environment. The development of the epidemic can be predicted according to the changes in temperature and humidity, and corresponding prevention and control measures can be taken. According to the climate change in the area where the virus activity changes, epidemic prevention, and control measures should be strengthened when the virus activity is low, so as to suppress the trend of a virus outbreak and achieve the purpose of epidemic prevention and control. At the same time, attention should be paid to the prevention and control of the environmental factors such as temperature and humidity suitable for virus survival and transmission channels. Therefore, countries must seize the influence of climatic factors to take active measures to control the first COVID-19 epidemic when the virus activity is at a low level.
Therefore, the development of the epidemic can be predicted according to the changes of temperature and humidity, and corresponding prevention and control measures can be taken. According to the climate change in the area where the virus activity changes, the epidemic prevention and control measures should be strengthened when the virus activity is low, so as to suppress the trend of virus outbreak and achieve the purpose of epidemic prevention and control. At the same time, Attention should be paid to the prevention and control of the environmental factors such as temperature and humidity suitable for virus survival and transmission channels. Therefore, countries must seize the influence of climatic factors to take active measures to control the first COVID-19 epidemic when the virus activity is at a low level. At the same time, they should not relax the prevention and control measures to prevent the second COVID-19 epidemic caused by the increase of virus activity due to the change of climate factors.
At present, the specific mechanism of the interaction between Temperature, Humidity and Virus activity is unknown. But like influenza virus, COVID-19 can be transmitted by aerosol [26]. Casanova believes that compared with medium relative humidity (50%), COVID-19 has a greater survival rate or greater protection at high relative humidity (80%) [27]. So this research speculated that low temperature and high humidity lead to the increase of suspended solids in the atmosphere, which provides the ideal conditions for virus attachment, replication and transmission. Low temperature can also dry the mucous membrane, reduce the function of cilia, and support the survival and transmission of virus and the spread of disease [28]. Therefore, it is speculated that temperature and humidity can affect the spread of COVID-19 by affecting the spread of COVID-19. On the one hand, temperature affects the human mucous membrane to reduce the human resistance to viruses. On the other hand, high humidity can increase the quality of aerosols in the air. The concentration increases the number of aerosol particles in the air [29], which leads to an increase in the speed of virus transmission. The combination of temperature and humidity reduces human resistance to viruses and enhances the speed of virus transmission, thus showing that the speed of virus transmission is accelerated, and the number of new confirmed cases increases daily.

Conclusions and Future Work
The analysis in this paper shows that the activity of the new coronavirus has the following characteristics: 1. There is a high correlation between the activity of the COVID-19 and temperature and humidity, while temperature is more correlated with the activity of the COVID-19. Wind speed, sea level pressure, altitude, and rainfall have little effect on the spread of the epidemic. 2. The intervention of epidemic prevention and control measures can restrain the influence of climatic factors on the spread of the virus. But climatic factors alone are not enough to restrain the spread of the epidemic. 3. Temperature and humidity in tropical areas have a more obvious impact on the spread of the epidemic.
Accordingly, this paper proposes the following conclusions, recommendations for global COVID-19 prevention and control.
1. It is better for each country to take appropriate or even more stringent prevention and control measures to minimize the risk of outbreak the epidemic by taking into account the development and changes of climatic factors such as temperature and humidity in the early stage of the epidemic. 2. As time goes by, the climate changes in the northern and southern hemispheres will affect the activity of the virus. It is necessary to pay special attention to the prevention and control of the virus, so as to prevent the spread of the COVID-19 epidemic in the southern hemisphere and secondary outbreak in some parts of the northern hemisphere due to the increased activity of the virus. 3. Countries with better epidemic prevention and control or countries with less serious epidemic situation shouldn't take it lightly. It is necessary to strictly control various public areas to prevent the risk of re-outbreak caused by the enhancement of virus activity due to the climatic factors such as temperature and humidity.
This paper assesses the correlation analysis between the spread of COVID-19 epidemic and climatic factors. The number of confirmed cases is inevitably underestimated due to different detection coverage rates of COVID-19 in different countries, and the impact of changes in policies and local prevention and control strategies on the spread of the epidemic was not assessed in this study. Therefore, in the future work these issues need to be more detailed exploration.

Conflicts of Interest:
The authors declare no conflict of interest.