Future Climate Data from RCP 4.5 and Occurrence of Malaria in Korea

Since its reappearance at the Military Demarcation Line in 1993, malaria has been occurring annually in Korea. Malaria is regarded as a third grade nationally notifiable disease susceptible to climate change. The objective of this study is to quantify the effect of climatic factors on the occurrence of malaria in Korea and construct a malaria occurrence model for predicting the future trend of malaria under the influence of climate change. Using data from 2001–2011, the effect of time lag between malaria occurrence and mean temperature, relative humidity and total precipitation was investigated using spectral analysis. Also, a principal component regression model was constructed, considering multicollinearity. Future climate data, generated from RCP 4.5 climate change scenario and CNCM3 climate model, was applied to the constructed regression model to simulate future malaria occurrence and analyze the trend of occurrence. Results show an increase in the occurrence of malaria and the shortening of annual time of occurrence in the future.

Using data from 2001-2011, the effect of time lag between malaria occurrence and mean temperature, relative humidity and total precipitation was investigated using spectral analysis. Also, a principal component regression model was constructed, considering multicollinearity. Future climate data, generated from RCP 4.5 climate change scenario and CNCM3 climate model, was applied to the constructed regression model to simulate future malaria occurrence and analyze the trend of occurrence. Results show an increase in the occurrence of malaria and the shortening of annual time of occurrence in the future.

Introduction
Malaria is an acute infectious disease caused by plasmodium parasite infection in red blood cells and liver cells. Approximately 300-500 million cases of malaria patients are reported annually, and over a million patients die from this parasitic infection. The World Health Organization (WHO) has designated malaria as one of the six major tropical diseases. Infection through plasmodium vivax and plasmodium falciparum is a cause of more than 95% of reported cases, and in Korea, plasmodium vivax infection is found in the majority of the reported cases. Generally, infectious diseases that use host as a means of infection are highly susceptible to climate which influences interactions within the ecosystem [1]. Mosquito-borne diseases are especially susceptible and it is known that they are greatly influenced by temperature, precipitation, humidity and other factors. Amongst these diseases, malaria is the most well addressed infectious disease and it is also most susceptible to climate change [2].
Many studies have been conducted to investigate how malaria is related to climate change and climate factors. Poveda et al. [3] conducted a study of malaria and time lag effect of El Nino climate in Columbia, U.S. and Craig et al. [4] investigated the relationship between malaria and the probability distribution of climate factors in the Sahara region. Paaijmans et al. [5] correlated malaria to mean temperature and analyzed the change in the influence of malaria with daily temperature change [6]. Olson et al. [7] analyzed the relationship between malaria and wetland and climate factors. Prediction of malaria using an impulse function computed from the climate factors has been made by Kuhn et al. [8], whereas Thomson et al. [9] predicted malaria infection through seasonal climate factors. Jhajharia et al. [10] investigated the influence of climate on incidences of malaria in the Thar desert of Northwest India and found that climatic variability and rise in temperature were key determinants to the transmission of malaria.
Current malaria and climate-related investigations have focused on determining the correlation between malaria and climate factors for predicting the occurrence of malaria [11][12][13][14][15]. Since climate change is now accepted as real, changes in the characteristics of malaria in relation to climate change have been investigated [16,17]. For example, malaria's vulnerability to trends of the past century have been analyzed [2,18,19]. Likewise, changes in the future characteristics of malaria in Africa, the most malaria-vulnerable continent, have been analyzed [20][21][22]. In order to prepare adequately for ongoing global warming, van Lieshout et al. [23] predicted fluctuations in malaria incidences according to the SRES climate change scenario. These studies show future trends of malaria and climate change and how the occurrence of malaria infection will be affected by future climate change [16,[24][25][26][27][28][29][30][31].
The objective of this study is therefore to investigate the occurrence of malaria and its correlation with climatic factors and construct a regression model for predicting malaria occurrences in Korea from climate factors. By simulating future malaria occurrences in Korea, the model can provide basic data for public health agencies. For this study, data on monthly malaria occurrences and climatic factors from 2001-2011 was collected. By applying the climate data generated from a climate change scenario and a climate model to the regression model, future malaria occurrences and their trends were analyzed.

Trend of Malaria Occurrence
Although it is commonly believed that malaria seems to be decreasing in Korea [32]). The communication of malaria is caused by female Anopehline [33]. Most mosquitoes that carry malaria show a much more vibrant activity as weather gets warmer [34]. Therefore, in Korea, an increase of malaria infection is strongly related to the rise in temperature caused by climate change [35].

Data Collection
Jang et al. [1] has reported that malaria has the highest infected domestic patients among the diseases showing relevance to climate factors and climate change. Since national infectious disease database has been established, quality data can now be acquired. Monthly data of the designated infectious diseases between 2001 and 2011 from the Center for Disease Control & Prevention was utilized for obtaining data on malaria occurrences. Data from 2001-2008 was used for calibration and data from 2009-2011 was used for verification. For the same period, climate data was obtained from the Seoul Regional Meteorological Observatory. Based on the data from Center for Disease Control & Prevention [32], the highest malaria occurrence in 2008 happened in Gyeonggi province (475 reported cases), followed by Seoul (180 reported cases), and Incheon (92 reported cases). The distribution of malaria was highly concentrated in the Seoul and Gyeonggi areas. Thus, climate variables from the Gyeonggi and Seoul Meteorological Observatories were averaged into one for use in this study. Also, data from weather stations at E.L. 200 m above the sea level may disrupt the average values of regional meteorological factors, so these were excluded from analysis. Meteorological factors included in the analysis were average temperature (°C ), relative humidity (%), and precipitation (mm). The collected meteorological and malaria data are shown in Figure 1a, and the scatter diagram of malaria is shown in Figure 1b-d and for average temperature, relative humidity and precipitation, Figure 1d is referred to.

Methodology
This study performs a regression analysis for modeling the correlation between malaria which is the dependent variable and climate variables which are independent variables. While doing so, spectral analysis must be done to reflect the time lag between each variable and the BDS test was done to check the randomness of the time series. Also, the principal components regression should be applied and tested for considering multicollinearity between variables [36,37].

Spectral Analysis
Autospectrum or crossspectrum is a method to explain the distribution of variance of frequency drawn from a single or multiple data [38]. Covariance was estimated using autocovariance for data for time interval t and lag k : where N : Number of data series, t , and t  : time interval and time step. (1) Autocorrelation function can be obtained by dividing the autocovariance by 2 can be obtained by applying the Fourier transform to the autocorrelation function. Spectral analysis would give periodic information of the malaria time series. Cross spectral analysis can also be applied to cross covariance of the ) (t x , ) (t y times series and can be also used to differentiate the relevant covariance of two time series data set to have lag k . Cross covariance and cross spectrum ( xy X ) of two times series with time interval t can be determined using Equations (2) and (3): Also, using the cross spectrum on two time series, coherency can be calculated and information about the frequency between the two (see Equation (4)) can be obtained. Coherency between two time series is commonly represented in the 1xy C form.

Brock-Dechert-Scheinkman(BDS) Statistic
The BDS statistics is a method to verify a hypothesis that times series complies with the random distribution based on the correlation integral. It is an effective tool to discern between random time series and nonlinear chaos or stochastic system [39][40][41]. Grassberger and Procaccia [42] suggested correlation integral as a method to measure the fractal dimension on deterministic data, and for embedding dimension m, computed as N is the size of the data sets, M = N -(m -1)t is the number of embedded points in m-dimensional space, and  denotes the sup-norm. On target data . In the case where target data is a stochastic time series with stationary character and also possesses frequency and limit, the value of the corresponding limit can be represented as shown in Equation (6): Here, if data possesses independent identically distributed characteristics, . In this case, is an asymptotic normal distribution with an average of 0, and variance 2  as follows: Thus, using ) , 1 ( r C and the K value, the value for coefficient C can be calculated as Thus, under the assumption that time series data is independent identically distributed, where m > 1, the BDS statistic can be represented as The BDS statistic can be an effective tool to determine whether the given time series is random or is a nonlinear system (chaos or stochastic). To utilize the BDS statistic, selection of the m and r values is critical. In this study, m was defined as 5 2   m [39], and for r ,

Principal Components Regression
If there is a correlation between selected independent variables that show multicollinearity during regression analysis, then some of the independent variables may be eliminated or new sets of observed values may be introduced. Also, depending on the situation in the field, correlation itself is also subject to elimination [44]. However, elimination of independent variables is not desirable, considering the information loss. As an alternative, principal component regression (PCR) is often utilized [45][46][47][48]. Principal component regression selects principal components (less than original variable in numbers) and combines them with the regression model through principal components analysis (PCA) by compressing the dimension as suggested by Morison [49]. By estimating the principle component score of independent variables and conducting regression analysis, information loss can be minimalized while avoiding multicollinearity [50].
In principal component regression, the eigenvalue ( i  ) matrix  of the correlation coefficient matrix and eigenvector V satisfy the following Equation (10): Eigenvector is an orthogonal matrix, so I VV T  is valid. When Equation (10) is substituted in the general form of regression model in the matrix form, it can be arranged into a regression model as: where X : independent variables of i th , and  : coefficients matrix of independent variables,  : error term.
In Equation (11), XV is a linear combination of independent variables It is a transformation of independent variable by multiplying the eigenvector so variables would be orthogonal and it is called principal components. Principal components are independent, so the multicollinearity issue is resolved and thus regression analysis can be conducted without loss of information.

Nonlinear Regression Analysis
Regression analysis was conducted on monthly malaria occurrences and climate variables for the period between 2001 and 2008. Nonlinear regression was applied to climate variables and reported malaria occurrence counts to conduct regression analysis. Many recent studies [51,52] that studied malaria in Korea and climate variables considered the time lag effect so it was assumed that there is correlation in accordance with the time lag. So, autospectral analysis is in accordance with Figure 2a, and it also represented coherency analysis between each climate variable and malaria occurrences in Figure 2b-d.
As the autospectrum result of malaria shows in Figure 2a, the malaria occurrence time series seems to have a 12 month strong frequency (=0.083) and a 6 month weak frequency (=0.167). Figure 1a confirms that the malaria occurrence series possesses an annual period. Looking at the squared coherency from Figure 2b-d, the frequency of each climate variable and malaria can be defined. Coherency between malaria and average temperature shows a high coherency in 12 and 6 months (see Figure 2b), and the same can be said for humidity (see Figure 2c). Lastly, for malaria and precipitation, high coherency is shown in 12, 6 and 4 month periods. Considering that coherency explains the correlation level exhibited in every time series, each coherency seems to be the time lag effect for respective coherency. Thus, in this study, the time lag effect of respective period was considered when doing regression analysis.
Also, when data is analyzed, they are assumed random variables and it is assumed that each variable is independent when doing regression analysis. So, tests for randomness and nonlinearity were conducted for the malaria occurrence series. In this research, four commonly used nonparametric tests (Anderson Correlogram, Run Test, Spearman Rank, Correlation Coefficient and Turning point test) that Salas et al. [53] suggested and BDS statistics test were used.
The Anderson correlogram and Spearman test judged data as random data as shown in Table 1, but run test and turning point inferred otherwise. Statistics of BDS (m) showed that it exceeded the significance level in every dimension and this shows malaria occurrence series possessed nonlinear chaotic or stochastic property. Given this, when constructing the regression model for malaria occurrences, regression analysis was conducted with consideration for nonlinearity. Results of multiple regression analysis of malaria occurrences based on nonlinear regression analysis are shown in Figure 3a and Equation (12).
where y : monthly malaria occurrence (person); 1 x , 2 x , 3 x : monthly mean temperature(℃) with 0, 6, 12 month lags; 4 x , 5 x , 6 x : monthly relative humidity(%) with 0, 6, 12 month lags; and 7 x , 8 x , 9 x , 10 x : monthly total precipitation(mm) with 0, 6, 12, 4 month lags.  Results of multiple regression analysis show that the coefficient of determination 2 R of multiple regression model was 0.805, and the modified coefficient of determination 2 R was 0.753 and for the regression analysis model, F = 15.5. The significance probability was 4.57 × 10 −19 , and statistical significance did exist (see Figure 3a). However, according to Figure 1a, it is highly likely that there is coherency between data, and if this is the case, regression model drawn from regression analysis may be under the influence of multicollinearity. Thus, analysis was conducted based on malaria occurrences and each of the Pearson, Kendall and Spearman correlation coefficients; it revealed that each variable had a high correlation, from 0.52 between average temperature and humidity to 0.78 between average temperature and malaria (see Figure 4). Thus, there is clearly a strong correlation between each climate variable and there could be an error when building a regression model for malaria through multiple regression analysis.

PCA-Regression Analysis
To conduct principal component regression analysis between malaria occurrences and climate variables, principal component regression was conducted on average temperature, average humidity and precipitation. Principal components scores up to principal components 1, 2, and 3 were given-2.9698, 3.0490 and −0.6665, respectively. Principal components can be narrowed down to three with the condition of 80% cumulative contribution rate; the table of principal components analysis is shown in Figure 5.  Principal components were categorized into the following: first principal components were average temperature ( 1 x ) and average humidity ( 5 x , 6 x ); second principal components were average humidity ( 6 x ) and precipitation ( 7 x ); and third principal components were average temperature ( 1 x ), average humidity ( 5 x , 6 x ) and precipitation (   . 0 . 0 Results of malaria occurrence simulation based on the principal component regression model are compared in Figure 3. The coefficient of determination 2 R of principal components regression model was 0.743 and the modified coefficient of determination 2 R was 0.725. In the regression analysis model, F = 4.28 and significance probability was 3.4 × 10 −24 , thus statistical significance was present (See Figure 3b). Also, since each principal component is independent, the regression model can be established without the multicollinearity problem. Thus, principal components regression analysis is quite plausible for simulation of malaria occurrences.

Validation of Malaria Model
To evaluate the suitability of malaria occurrence model calculated from PCA, evaluation of regression model was conducted using the malaria occurrence data from 2009-2011 and the results are shown in Figure 6.
Results of evaluation using principal component regression analysis show that the coefficient of determination of PCA-regression model 2 R was 0.852 and the NRMSE (Normalized Root Mean Square Error) index, which shows the efficiency of the model suggested by Scott and Fred [54], was 0.117 and the RE (Relative Error; [36]) index was 0.026. It can be concluded that malaria occurrence time series simulated by the model does have significance with actual observation data.

Climate Change Scenario
To simulate the condition of climate change using the malaria occurrence regression model as suggested in Section 4, future climate data is needed. In this study, future climate data was acquired through a climate change scenario and a climate model and then we analyzed the trend of malaria occurrences. For climate change scenario, future carbon dioxide concentration scenario was used as the boundary condition of the climate model and many different situations were assumed and applied. RCP (Representative concentration pathways) scenario, which will appear in the IPCC 5th assessment report released in 2013/2014, is the most studied scenario [55][56][57][58][59][60][61][62], and in this research, the RCP scenario was applied. The RCP scenario is categorized into four different scenarios depending on the carbon dioxide reduction degree [57] (see Table 2). In this study, the SRES A1B scenario from the existing SRES AR4 scenario study [63], which counterparts RCP 4.5, and seems to conform with reality the most, was analyzed. Also, another RCP scenario study [64] presented that RCP 4.5 will most effectively reflect future climate in accordance with the present condition, so RCP 4.5 was applied.  Figure 7.

Future Malaria Simulation and Analysis
The study simulated the future malaria occurrence using monthly climate data from 2011-2100 according to RCP 4.5 climate change scenario and the HadGEM3-RA climate model and malaria regression model. In the simulation data, 676 occurrences in July 2040 and 695 occurrences in July 2089 appeared to be the biggest incidents (Figure 8).  There seems to be a general trend in accordance with the present state, but it shows a slight increase. When the annual maximum series is generated with the maximum malaria occurrence as a criterion, the trend becomes highly visible. As shown in Figure 9, simulation shows that annual maximum occurrence per year increases as time passes.
Also, there is some change in the annual occurrence trend. From 20010-2011, August was the month of maximum occurrence, but simulation data of 2011-2100 predicts July will be the month of maximum occurrence. In addition, compared to the spike of occurrence between June and August in the existing data, the occurrence will continually increase between April and July in the future (see the Figure 10).  In Figure 11, the monthly average malaria occurrence is shown conclusively. Compared to the concentrated occurrence between May and August, simulation shows that the occurrence will start in April and will reach its maximum in July in the future. Thus, in the future, preparation for malaria outbreak must be executed earlier than the present guideline in which preparation starts after the rainy season in summer. Figure 11. Monthly malaria occurrence plot for each period.

Conclusions
The study clarifies correlation between monthly climate data and malaria infection occurrence and establishes a regression model. Also, the future malaria occurrence up to year 2100 is simulated using future climate data generated for a climate change scenario and climate model and analysis on malaria occurrence time series. Results drawn from this study can be concluded as below: 1. Correlation between malaria occurrence and monthly average temperature, relative humidity and precipitation data is analyzed with time lag effect between malaria occurrence and climate variables using spectral analysis between each variable. A strong coherency between each climate variable data is clear, thus regression model is analyzed under the influence of multicollinearity. To resolve this issue, principal component regression analysis based on PCA is used to establish a regression model. Using the regression model, malaria infection occurrences from 2009-2011 are tested and coefficient of determination 2 R is 0.852, NRSE is 0.117 and RE is 0.026, which clearly accounts for malaria infection. 2. By applying climate data between 2011 and 2100 using the RCP 4.5 climate change scenario and the CNCM3 climate model to the regression model, future malaria occurrence is simulated. Analysis of simulated data shows the malaria occurrence trend in general will gradually increase. Also, in the future, the occurrence period will diminish and it shows an increase of malaria occurrence before the rainy season in summer; thus, adaptation in the malaria occurrence response plan of Korea is needed.