Future Changes of Precipitation over the Han River Basin Using NEX-GDDP Dataset and the SVR_QM Method

: After the release of the high-resolution downscaled National Aeronautics and Space Administration (NASA) Earth Exchange Global Daily Downscaled Projections (NEX-GDDP) dataset, it is worth exploiting this dataset to improve the simulation and projection of local precipitation. This study developed support vector regression (SVR) and quantile mapping (SVR_QM) ensemble and correction models on the basis of historic precipitation in the Han River basin and the 21 NEX-GDDP models. The generated SVR_QM models were applied to project changes of precipitation during the 21st century for the region. Several statistical metrics, including Pearson’s correlation coe ﬃ cient (PCC), root mean squared error (RMSE), and relative bias (Rbias), were used for evaluation and comparative analyses. The results demonstrated the superior performance of SVR_QM compared with multi-layer perceptron (MLP), SVR, and random forest (RF), as well as simple model average (MME) ensemble methods and single NEX-GDDP models. PCC was up to 0.84 from 0.61–0.71 for the single NEX-GDDP models, RMSE was up to 34.02 mm from 48–51 mm, and Rbias values were almost removed. Additionally, the projected precipitation changes during the 21st century in most stations had an increasing trend under both Representative Concentration Pathway RCP4.5 and RCP8.5 emissions scenarios; the regional average precipitation during the middle (2040–2059) and late (2070–2089) 21st century increased by 3.54% and 5.12% under RCP4.5 and by 7.44% and 9.52% under RCP8.5, respectively.


Introduction
Extreme weather will occur more frequently under the background of global warming. As a result, human society, economy, life, and natural ecosystems will be more affected [1,2]. It is essential for researchers, managers, and citizens to know the future climate change trends so that losses caused by extreme disasters can be minimized as much as possible by effective preventive measures.
General circulation models (GCMs) are one of the most important and feasible methods for predicting future large-scale climate change and have become a major research tool in the field of global change [3][4][5]. However, it is difficult for GCMs to understand and adequately model climate systems due to their complexity and topography, and large uncertainties thus exist in their projections, especially at the regional scale. As a method of transforming the output information of large-scale and low-resolution global climate models into regional climate change information at small scales and high resolution, downscaling technology can obtain more refined precipitation variation characteristics,

Study Area and Data
The Han River is the main tributary of the middle reaches of the Yangtze River; it is 1577 km long and covers 159 thousand km 2 . In the Han River basin, the river system is veined and contains many tributaries. The upper reaches mainly include mountains and hills, whereas the lower reaches include the Jianghan plain. Over the past 50 years, the average annual rainfall has been approximately 700-1100 mm. The Han River basin has been suffering from flood and drought since the 1990s due to the integrated effects of natural and human factors. In this study, 21 meteorological stations in the Han River basin were selected on the basis of the principle of data continuity and integrity. In addition, the observed precipitation of eight stations around the Han River were used only to compute the mean precipitation of the region. The observed daily precipitation data in the Han River were used to train and validate the ensemble and mapping models in the period from 1961 to 2005. These data were acquired from the website of China Meteorological Data [27]. Figure 1 depicts the river system, altitudes and the distribution of 21 stations on the Han River, and 8 stations around the Han River basin. Table 1 describes the position information for these stations. The stations 1-21 represent the stations on the Han River, and the remaining eight stations represent the stations around the Han River basin.
Atmosphere 2019, 10, x FOR PEER REVIEW 3 of 22 to (1) develop station-based SVR_QM ensemble and correction models for NEX-GDDP precipitation in the Han River basin; (2) study the comparison of the ensemble prediction ability of MLP, SVM, and RF for NEX-GDDP precipitation; and (3) project the changes of monthly precipitation in the 21st century on the basis of SVR_QM models under RCP4.5 and RCP8.5 in the Han River basin. The contribution of the present study was the exploitation of the SVR_QM methods and NEX-GDDP data that was done to improve the reliability of precipitation simulation and projection at the regional scale in the Han River basin. Improved projections of high-resolution precipitation will be more beneficial for the guidance of long-term management strategies such as water resources allocation, flood mitigation, and ecological layout. The remainder of this paper is formulated as follows: Section 2 introduces the Materials and Methods, including the topographical and climatic conditions of the Han River basin, the observed data used and NEX-GDDP data, and the methodology in this study. Section 3 depicts and discusses the results. Finally, several conclusions and prospects from this study are presented.

Study Area and Data
The Han River is the main tributary of the middle reaches of the Yangtze River; it is 1577 km long and covers 159 thousand km 2 . In the Han River basin, the river system is veined and contains many tributaries. The upper reaches mainly include mountains and hills, whereas the lower reaches include the Jianghan plain. Over the past 50 years, the average annual rainfall has been approximately 700-1100 mm. The Han River basin has been suffering from flood and drought since the 1990s due to the integrated effects of natural and human factors. In this study, 21 meteorological stations in the Han River basin were selected on the basis of the principle of data continuity and integrity. In addition, the observed precipitation of eight stations around the Han River were used only to compute the mean precipitation of the region. The observed daily precipitation data in the Han River were used to train and validate the ensemble and mapping models in the period from 1961 to 2005. These data were acquired from the website of China Meteorological Data [27]. Figure 1 depicts the river system, altitudes and the distribution of 21 stations on the Han River, and 8 stations around the Han River basin. Table 1 describes the position information for these stations. The stations 1-21 represent the stations on the Han River, and the remaining eight stations represent the stations around the Han River basin.   Regarding the NEX-GDDP data, the downscaled historical precipitation and 21st century precipitation data under the RCP4.5 and RCP8.5 scenarios were chosen for this region. NEX-GDDP is a novel high-resolution (0.25 • longitude × 0.25 • latitude) daily downscaled dataset released in June 2015 by NASA. Specifically, the NEX-GDDP, which is called 'NASA Earth Exchange Global Daily Downscaled Projections', was generated from 21 CMIP5 model simulations based on bias-correction spatial disaggregation (BCSD) downscaling technology [28]. Three climatic variables were included in this dataset: daily precipitation and maximum and minimum temperature. The time span included the historical period of 1950-2005 and the future period of 2006-2100 (RCP4.5 and RCP8.5 runs). The total storage space of the dataset source file (*.nc) was more than 12 terabyte (TB). Table 2 describes the RCP4.5 and RCP8.5 scenarios and Table 3 shows the 21 GCM models used that were downscaled to obtain NEX-GDDP. An official website provides more details on this dataset [29], which can be freely acquired via the https://cds.nccs.nasa.gov/nex-gddp/ website. In this study, the global 21 NEX-GDDP precipitation data were downloaded. In the process of evaluating the simulation ability of NEX-GDDP, the average of the data for the nine grid cells nearest to the observed station was regarded as the simulation precipitation for the corresponding station. On the basis of the global data, the monthly simulation station data in the Han River basin were obtained.

Methodology
This study developed an SVR_QM ensemble and correction framework on the basis of the NEX-GDDP dataset, SVR ensemble methods, and the QM correction method for the precipitation of the Han River basin. Then, according to the established models, the future precipitation was projected. The procedure referred to in this study consisted of four steps for the ensemble simulation and the projection of station precipitation: (1) data preprocessing; (2) selecting the superior ensemble method from MLP, SVR, and RF; (3) combining SVR and the QM method; and (4) evaluation and projection using the combined SVR_QM framework.
The detailed procedure and methods used to develop the SVR_QM models and analyze projected rainfall in this study are discussed in the following subsections.

Data Preprocessing
This subsection mainly includes two steps. One is the raw simulation of 21 NEX-GDDP models, where this study used the average value of 9 grid cells nearest to the observed station to represent the simulation precipitation of corresponding stations. The region mean was computed on the basis of inverse distance weighted (IDW) method and observed data of 29 stations. IDW was used to interpolate observed data to the corresponding grids of NEX-GDDP data on the Han River. Then, the arithmetic mean of these grid data was used as the region mean. Further, the daily data were transformed into monthly data. The other process was the data process for the input of ML ensemble models. Principal component analysis (PCA) was selected to extract the principal components (PCs) that could reduce the number of input variables and maintain the information [30]. In statistics, PCA is a strategy to simplify datasets that can map multiple indicators to several comprehensive indicators on the basis of the principle of dimensionality reduction. The detailed steps and equations of PCA can be seen in [30].
In this study, the PCs of 21 NEX-GDDP precipitation series for each station were calculated, and the first few PCs were chosen as the transformed results when the cumulative contribution rate was greater than 95% among all the PCs. The selected PCs were used as the input of ML ensemble models. In fact, this study compared the performance of the PCA in used and not-used cases and found that there were no clearly different ensemble results. Before PCA, data normalization was conducted to alleviate the influence of single-sample data.

Selecting the Superior Ensemble Method from MLP, SVR, and RF
After data preprocessing, MLP, SVR, and RF methods were applied to the ensemble 21-model NEX-GDDP, and the performance of each method was compared. Then, the superior SVR method was selected. The applied methods have previously been successfully applied to modelling nonlinear relationships between local precipitation and GCM predictors [18] because they have the ability to model highly nonlinear relationships.
MLP is a typical neural network [31] with the back-propagation (BP) training algorithm [32]. In this study, a typical three-layer MLP network was used that consisted of one input layer, one hidden layer, and one output layer. Figure   interpolate observed data to the corresponding grids of NEX-GDDP data on the Han River. Then, the arithmetic mean of these grid data was used as the region mean. Further, the daily data were transformed into monthly data. The other process was the data process for the input of ML ensemble models. Principal component analysis (PCA) was selected to extract the principal components (PCs) that could reduce the number of input variables and maintain the information [30]. In statistics, PCA is a strategy to simplify datasets that can map multiple indicators to several comprehensive indicators on the basis of the principle of dimensionality reduction. The detailed steps and equations of PCA can be seen in [30]. In this study, the PCs of 21 NEX-GDDP precipitation series for each station were calculated, and the first few PCs were chosen as the transformed results when the cumulative contribution rate was greater than 95% among all the PCs. The selected PCs were used as the input of ML ensemble models. In fact, this study compared the performance of the PCA in used and not-used cases and found that there were no clearly different ensemble results. Before PCA, data normalization was conducted to alleviate the influence of single-sample data.

Selecting the Superior Ensemble Method from MLP, SVR, and RF
After data preprocessing, MLP, SVR, and RF methods were applied to the ensemble 21-model NEX-GDDP, and the performance of each method was compared. Then, the superior SVR method was selected. The applied methods have previously been successfully applied to modelling nonlinear relationships between local precipitation and GCM predictors [18] because they have the ability to model highly nonlinear relationships.
MLP is a typical neural network [31] with the back-propagation (BP) training algorithm [32]. In this study, a typical three-layer MLP network was used that consisted of one input layer, one hidden layer, and one output layer. Figure 2 depicts the construction of the applied MLP network. {x1,...xm,…xn} represents the PCs of NEX-GDDP precipitation, and y represents the corresponding observed data. {h1,…hs} denotes the nodes of the hidden layer. Equation (1) describes the input-output equation of the applied MLP network in this study [33]: where are the weights in the hidden layer that connect the i-th neuron in the input layer and the j-th neuron in the hidden layer, wjo is the bias for the j-th hidden neuron, is the activation function of the hidden neuron, is the weight between the j-th neuron in the hidden layer and the neuron in the output layer, is the bias for the output neuron, and is the activation function for the output.
SVM is also a machine learning method based on Vapnik-Chervonenkis (VC) theory and the rule of structural risk minimization [17]. SVR is the SVM that solves nonlinear regression problems Equation (1) describes the input-output equation of the applied MLP network in this study [33]: where w ji are the weights in the hidden layer that connect the i-th neuron in the input layer and the j-th neuron in the hidden layer, w jo is the bias for the j-th hidden neuron, f h is the activation function of the hidden neuron, w j is the weight between the j-th neuron in the hidden layer and the neuron in the output layer, w o is the bias for the output neuron, and f o is the activation function for the output.
SVM is also a machine learning method based on Vapnik-Chervonenkis (VC) theory and the rule of structural risk minimization [17]. SVR is the SVM that solves nonlinear regression problems by applying kernel functions to map the low-dimensional data to a high-dimensional feature space. SVR methods have been successfully applied in precipitation downscaling [34,35]. There are no documented applications for ensemble multiple NEX-GDDP precipitation. In this study, the applied SVR model can be represented by Equation (2): where Kernel denotes the applied kernel function; α i and α iˆd enote Lagrange multipliers, which could achieve the optimization problem; b is a parameter; x i are vectors; and x is the independent vector. The parameters are derived by maximizing the objective function.
In addition, RF was proposed by Breiman [36] as a novel machine learning algorithm. It includes a multiple classification and regression decision tree (CART), which may avoid over-fitting and can adjust different types of input variables. For more detail on CART analysis, refer to Breiman et al. [37]. RF can generate many independent trees and then make a final decision on the basis of its characteristics of nonparametric statistical regression and randomness. Accordingly, the decision-making ability of the RF model hinges on each CART. Using out of bag (OOB), RF can be internally cross-validated. This study applied the OOB error (E OOB ) to estimate the internal error, represented by Equation (3): where Y(X i ) are the predicted values and Y i are the station observations. Regarding RF, the number of trees and the maximum depth of each tree are the main hyperparameters. Note that the choice and determination of hyperparameters for machine learning methods is important; for example, for MLP, it is essential for the choice of the number of hidden layers and neurons, activation functions, optimal algorithms, and others [38]. For SVR, it is important for the penalty factor, toleration, and kernel function, and for RF, the number of trees and the maximum depth of each tree are important.
In this study, Bayesian hyperparameter optimization (BHO) was used to determine the hyperparameter choice of MLP, SVR, and RF ensemble models. The BHO can map the hyperparameters to the corresponding scoring probability of the objective (e.g., the MSE and loss of model performance) to infer information on the unknown function [39]. In this study, the tree-structured Parzen estimator (TPE) algorithm was chosen because it performed better for several difficult learning problems [40]. The framework of sequential model-based global optimization (SMBO) was also used in BHO. In addition, in the process of hyperparameter optimization, a 10-fold cross-validation was applied to promote more reliable results-the dataset during the historic period of 1961-2005 was divided into 10 equal-sized sub-datasets. There were 10 rounds of training and validation; each round used 9 out of the 10 sub-datasets as training data, and the remaining round was used for validation.
The software used to implement BHO for MLP, SVR, and RF is introduced in Appendix A. Figures A1 and A2 in Appendix A depict the diagrams of the optimization process of the ML methods for the region mean. Tables A1-A3 in Appendix B provide the results of BHO of MLP, SVR, and RF for each model.
All ML ensemble models were established on the basis of the optimal hyperparameters, whereas the selected PCs of 21 NEX-GDDP precipitation variables were used as inputs to the models and drove them to generate the ensemble precipitation corresponding to the stations. Then, on the basis of the evaluation metrics from Section 2.2.4, SVR was selected as the best ensemble method.

Combining the SVR and QM Methods
Precipitation bias still remains after ensemble simulation, and thus it is important to further reduce bias. QM has been successfully applied for many precipitation bias-corrected studies, and it is considered the most efficient method for the task [25,26]. After the selection among the MLP, SVR, and RF ensemble methods, this study combined the SVR methods for ensemble simulation and the QM method for bias correction on the basis of 21 NEX-GDDP precipitation models. QM is a distribution-based method that is always used to align the cumulative distribution function (CDF) of two data series [41]. Equation (4) describes the general form of QM: where P q is the corrected precipitation after quantile mapping, f −1 sta is the inverse CDF corresponding to observed precipitation, f m denotes the CDF of ensemble-simulated data generated by SVR, and p m is the simulated data.
In this study, the employed QM technique was based on quantile-quantile (Q-Q) plots, which express the Q-Q relation of modelled and observed series. The Q-Q plot is regarded as an empirically based transfer function to align the percentiles of ensemble-simulated data and observations. This study determined the transfer function on the basis of historic precipitation and then applied the function to correct the simulation of future projections. The software and packages used to implement the QM method are introduced in Appendix C.

Evaluation and Projection for SVR_QM
The performance of raw NEX-GDDP models; MLP, SVR, and RF models; and SVR_QM models were all assessed by comparing the results with observations. In this study, three evaluation metrics were used, including Pearson's correlation coefficient (PCC), root mean squared error (RMSE), and relative bias (Rbias), equations that are shown in Table 4. These metrics were also regarded as the indicators for the performance comparison of each method. PCC was used to evaluate the degree of linear correlation between variables; a PCC of 0 denotes no correlation whereas 1 represents complete correlation. RMSE represents the errors between two variables; the smaller the RMSE, the better the results. Rbias was used to evaluate the relative deviation between simulated and observed data. Table 4. Detailed equations and variables involved in the statistical metrics.

Statistical Metric Equation Description Unit
Pearson's correlation coefficient (PCC) The projected precipitation rates from 2006 to 2095, under RCP4.5 and RCP8.5, were assembled into an ensemble and corrected using the established SVR_QM models. In other words, the corresponding PCs were selected, and the established SVR and QM models were used to obtain the station's future precipitation. Then, on the basis of the modelled results for the future, the yearly trends of precipitation changes were analyzed.

Validation and Comparison of the Machine Learning Ensemble Models
First, the MLP, SVR, and RF models have been used for ensemble simulations. For comparison, MME was used to ensemble the NEX-GDDP models, and the arithmetic mean of the precipitation values of the 21 models was used to yield an ensemble simulation. Table 5 shows the simulation performance of the 21 single NEX-GDDP models and MME ensemble model for the region mean, including PCC, RMSE, and Rbias. Given space limitations, the evaluation results of each station are presented in Table S1 in the Supplementary Materials. Each model had a certain ability to simulate the observed precipitation, although the simulation ability of each model and the performance for each station-based single model was obviously different. Obviously, the models 2, 4, and 15 overall outperformed the other NEX-GDDP models because the PCC reached 0.68-0.72, and the RMSE reached approximately 43-45 mm, whereas the models 6, 17, and 20 had relatively poor performance as the PCC was 0.60-0.61 and the RMSE was approximately 50-52 mm. Figure 3 depicts the Taylor diagram of raw NEX-GDDP models, MME, and ML ensemble models, which could present the PCC, RMSE, and standard deviation of each model and the observations. Generally, the closer to the 'observed' point, the better the performance. It can be seen that there were more obvious conclusions that were consistent with the conclusions of Table 5. In addition, the standard deviations of these models were closed to the observation. It is interesting that the good PCCs were accompanied by poor RMSEs and Rbias values in several cases. Maybe this was because the system deviation of CMIP5 models greatly impact the values of RMSEs and Rbias. Moreover, regarding the different performance of each station, the simulation results of the 21 models of stations 1, 5, and 9 were relatively poor, whereas those of stations 17, 19, and 21 were good. This may have been due to the local microclimate that the GCMs could not consider. The microclimate was influenced by the local topography, underlying surface, and weather. Additionally, the statistical downscaling strategy of generating the NEX-GDDP from these GCMs also did not consider regional climate. This theme is worthy of further study, as the local conditions of each station are different. These results also confirm the definite simulation ability of NEX-GDDP models for some complex terrain areas, as is demonstrated by the similar conclusions of previous studies [10]. For MME, there were clear improvements for all single NEX-GDDP models. For the region mean, the PCC was improved from 0.60-0.72 to 0.75, and the RMSE was reduced to 36.68 mm. This result is also consistent with those of previous studies, although the cases and specific values are different [20].   Figure 4 depicts the PC numbers for each station, and the comparison of three ML ensemble methods for the performance evaluation is shown in Table 6. It can be seen that SVR overall performed better than MLP and RF for ensemble, as the PCC reached 0.81 and RMSE reached 34.24 mm for region mean, whereas the PCC of MLP and RF were 0.77 and 0.78, respectively, and RMSE were 35.78 and 36.21 mm. For 21 stations, the PCC of SVR reached 0.56-0.86, and RMSE reached 37.64-80.65 mm, which also performed better than MLP and RF. The results of stations 7, 14, and 18 were very good, whereas those of stations 1, 2, 3, and 9 were relatively bad. As concluded from Tables  5 and 6, all the ML ensemble models showed greatly improved performance compared with the raw NEX-GDDP simulation and the results of MME, although the improvement degree for MME was not comparable to those for raw simulation. This situation may be because the MME ensemble was relatively good, which made significant improvement more difficult. A similar conclusion was confirmed in previous references, where SVR overall performed better than RF for GCM precipitation downscaling, although there were some opposite cases for specific stations [20]. However, it can be concluded that SVR was more reliable for the study area or the characteristics of used data. In future work, it is worth studying the applicability of SVR for other regions or basins. For the different results of specific stations, this was also perhaps because the influences of the unconsidered local climates of some stations were significant. Although the ML methods have been popularly applied, they were first used for the ensemble NEX-GDDP precipitation. The results in this study demonstrated that there were relative uncertainties among the three ML ensemble methods. Generally, the modelling performance of the ML methods depends on their inputs and parameters [42]. It is difficult to improve the raw quality of NEX-GDDP. However, for the parameter set, there may be room for improvement by improving the ML algorithm and optimizing BHO. Satisfactory research has applied the ensemble multi-method strategy to reduce the uncertainties [43], which has inspired further studies to apply more ensemble methods and obtain the best method that is more applicable at the method aspect.   Table 6. It can be seen that SVR overall performed better than MLP and RF for ensemble, as the PCC reached 0.81 and RMSE reached 34.24 mm for region mean, whereas the PCC of MLP and RF were 0.77 and 0.78, respectively, and RMSE were 35.78 and 36.21 mm. For 21 stations, the PCC of SVR reached 0.56-0.86, and RMSE reached 37.64-80.65 mm, which also performed better than MLP and RF. The results of stations 7, 14, and 18 were very good, whereas those of stations 1, 2, 3, and 9 were relatively bad. As concluded from Tables 5 and 6, all the ML ensemble models showed greatly improved performance compared with the raw NEX-GDDP simulation and the results of MME, although the improvement degree for MME was not comparable to those for raw simulation. This situation may be because the MME ensemble was relatively good, which made significant improvement more difficult. A similar conclusion was confirmed in previous references, where SVR overall performed better than RF for GCM precipitation downscaling, although there were some opposite cases for specific stations [20]. However, it can be concluded that SVR was more reliable for the study area or the characteristics of used data. In future work, it is worth studying the applicability of SVR for other regions or basins. For the different results of specific stations, this was also perhaps because the influences of the unconsidered local climates of some stations were significant. Although the ML methods have been popularly applied, they were first used for the ensemble NEX-GDDP precipitation. The results in this study demonstrated that there were relative uncertainties among the three ML ensemble methods. Generally, the modelling performance of the ML methods depends on their inputs and parameters [42]. It is difficult to improve the raw quality of NEX-GDDP. However, for the parameter set, there may be room for improvement by improving the ML algorithm and optimizing BHO. Satisfactory research has applied the ensemble multi-method strategy to reduce the uncertainties [43], which has inspired further studies to apply more ensemble methods and obtain the best method that is more applicable at the method aspect.

Validation of SVR_QM Method
According to Section 3.1, the SVR models performed best overall for the ensemble simulation of NEX-GDDP precipitation in this region. This study further applied the QM method to correct the results of the SVR models. According to Equation (4), the ensemble result from SVR was regarded as the simulated data, whereas is the inverse CDF corresponding to observed precipitation. Table 7 shows the results of SVR_QM models for each station and region mean. Satisfied results were shown in most stations, as the PCC was up to 0.58-0.85 and RMSE approximately reached to 37-80 mm for 21 stations. The performance for stations 1 and 3 were still relatively poor, whereas the results for stations 7, 14, and 18 were good. As for region mean, the PCC and RMSE reached 0.84 and 33.78 mm, respectively. More obviously, the Rbias were improved when compared with the results of ML methods and MME. Table 8 shows the comparison of MME, MLP, SVR, RF, and SVR_QM for

Validation of SVR_QM Method
According to Section 3.1, the SVR models performed best overall for the ensemble simulation of NEX-GDDP precipitation in this region. This study further applied the QM method to correct the results of the SVR models. According to Equation (4), the ensemble result from SVR was regarded as the simulated data, P m whereas f −1 sta is the inverse CDF corresponding to observed precipitation. Table 7 shows the results of SVR_QM models for each station and region mean. Satisfied results were shown in most stations, as the PCC was up to 0.58-0.85 and RMSE approximately reached to 37-80 mm for 21 stations. The performance for stations 1 and 3 were still relatively poor, whereas the results for stations 7, 14, and 18 were good. As for region mean, the PCC and RMSE reached 0.84 and 33.78 mm, respectively. More obviously, the Rbias were improved when compared with the results of ML methods and MME. Table 8 shows the comparison of MME, MLP, SVR, RF, and SVR_QM for the region mean. SVR_QM had the superior performance from Tables 6-8, and although the improvement of PCC and RMSE was not obvious, Rbias was almost eliminated for all cases. The Rbias obtained from SVR_QM reached −0.04% for the region mean, whereas the values obtained from MME, MLP, SVR, and RF were 2.23%, −1.82%, −2.48%, and −2.21%. This may have been mainly due to certain defects of data quality; it is difficult to improve the PCC and RMSE when data are to some extent defective. As the CMIP6 is ongoing, more reliable GCM data may be released in the future. There are great expectations for the improvement of correction accuracy on the basis of the new dataset. Figure 5 depicts the scatter plots between the monthly SVR_QM results and the observations for each station and the region mean in the period of 1961-2005. The horizontal axes show the observed precipitation, whereas the vertical axes show the SVR_QM results. The blue line represents the line of function 'y = x'. The more concentrated the scatter on the line, the closer the simulation to observations. Clearly, the degrees of concentration were different among all stations. The region mean was the most concentrated one, and stations 7 and 18 were more concentrated than other stations, whereas stations 1 and 3 were less concentrated. In conclusion, it was demonstrated that the simulation performance generated from SVR_QM had been improved, but some stations still exhibited relatively poor performance. These results also inspire the exploration of the influence of local climate or topography in the future. The QM method has been proven to have a certain ability to correct NEX-GDDP precipitation because consistent conclusions were also reached for GCM precipitation cases [7]. However, from Raghavan et al., the raw simulation of daily NEX-GDDP precipitation is poor [11]. It is worth attempting to apply the same framework for daily NEX-GDDP precipitation, which could prompt more reliable revelation of extreme rainfall and weather in the future, given the lack of research. Atmosphere 2019, 10, x FOR PEER REVIEW 13 of 22

Projected Precipitation in the Han River Basin during the 21st Century under RCP4.5 and 8.5
The monthly rainfall simulation was converted to annual time series. The non-parametric Mann-Kendall method [44][45][46] was used to detect future trends of yearly precipitation. Trends were tested at three significance levels of α = 0.10, 0.05, 0.01 (the |Z| was greater than 1.28, 1.64, and 2.32). Table 9 presents the changing trend and calculated values of Z of annual timescales of future precipitation for each station and region mean in the period of 2006-2095 under RCP4.5 and RCP8.5. From the table, it is implied that there are increasing trends among most stations under RCP4.5 and RCP8.5, as the corresponding precipitation series had positive trend values. In addition, these increasing cases almost had a significant trend, as the Z values were greater than 1.28. Under RCP4.5, the stations 9, 11, and 15 showed the most significantly increasing trend, as the Z values were up to 2.62, 3.31, and 3.58, respectively, whereas stations 10 and 18 showed a non-significantly increasing trend, as the Z values were 1.23 and 0.44, respectively. Under RCP8.5, the stations 9, 15, and 21 showed the most significantly increasing trend, as the Z values were up to 4.14, 4.94, and 4.21, respectively, whereas stations 5 and 10 showed a non-significantly increasing trend. For these increasing cases, the trend significance of RCP8.5 was higher than RCP4.5. In addition, there were less cases which showed a decreasing trend, such as stations 2, 5, 6, 7, 12, and 15 under RCP4.5, and stations 2 and 6 under RCP8.5. The trend differences of these stations may have been due to the difference of local climate. The monthly rainfall simulation was converted to annual time series. The non-parametric Mann-Kendall method [44][45][46] was used to detect future trends of yearly precipitation. Trends were tested at three significance levels of α = 0.10, 0.05, 0.01 (the |Z| was greater than 1.28, 1.64, and 2.32). Table 9 presents the changing trend and calculated values of Z of annual timescales of future precipitation for each station and region mean in the period of 2006-2095 under RCP4.5 and RCP8.5. From the table, it is implied that there are increasing trends among most stations under RCP4.5 and RCP8.5, as the corresponding precipitation series had positive trend values. In addition, these increasing cases almost had a significant trend, as the Z values were greater than 1.28. Under RCP4.5, the stations 9, 11, and 15 showed the most significantly increasing trend, as the Z values were up to 2.62, 3.31, and 3.58, respectively, whereas stations 10 and 18 showed a non-significantly increasing trend, as the Z values were 1.23 and 0.44, respectively. Under RCP8.5, the stations 9, 15, and 21 showed the most significantly increasing trend, as the Z values were up to 4.14, 4.94, and 4.21, respectively, whereas stations 5 and 10 showed a non-significantly increasing trend. For these increasing cases, the trend significance of RCP8.5 was higher than RCP4.5. In addition, there were less cases which showed a decreasing trend, such as stations 2, 5, 6, 7, 12, and 15 under RCP4.5, and stations 2 and 6 under RCP8.5. The trend differences of these stations may have been due to the difference of local climate. It is interesting to explore the relationship between the changing trend of climate and the local climate in the future study. In addition, for region mean, the increasing trends were very significant under RCP4.5 and RCP8.5, whose increasing trends were 0.58 and 0.85 mm/year, respectively, and Z values were up to 4.34 and 7.43, respectively. Assuming 1981-2000 as the historical baseline, Table 10 shows the changes of precipitation in the future compared with baseline years for each station and the region mean. The average rainfall during the middle (2040-2059) and late (2070-2089) 21st century was shown to increase by 3.54% and 5.12%, respectively, compared with the base years under RCP4.5, and they were shown to increase by 7.44% and 9.52% under RCP8.5, respectively. Most station cases showed the increase trend as the value reached 0.13% to 23.89%. Under RCP4.5 and RCP8.5, stations 6, 7, and 8 showed the biggest increase in change, whereas stations 1, 2, 14, and 16 showed the smallest increase in change. In addition, there were some decreasing cases during the middle and late 21st century, especially under RCP4.5. These differences may have been due to the raw data, model uncertainty, and local climate. In the future, the uncertainty of future projection should be explored and alleviated.  Figure 6 shows the changes of projected future annual precipitation in the Han River basin. Under RCP4.5, it can be seen that the rainfall during the 21st century is shown to have a weakly overall increasing trend and that there was shown to be a slight downward fluctuation, weakly increasing trend, and obviously increasing trend in the periods of 2005-2040, 2041-2059, and 2070-2089, respectively. Under RCP8.5, the increase of precipitation was shown to be more significant after 2040, and there were several years which were shown to have heavy rainfall, such as 2070 and 2089. This is also a valuable topic to study the year of heavy rainfall. Figure 7 compares the statistics of the historical baseline and the middle and late 21st century time series on the basis of quantile-quantile plots under RCP4.5 and RCP8.5. It can be seen that most rain distributions were near the normal distributions. In each sub-figure, three baselines that represented the corresponding normal distributions are shown. The interception and slope of these lines represent the mean and variance, respectively. Compared with the period of 1981-2000, the average precipitation of the mid and late 21st century under RCP4.5 and RCP8.5 were shown to clearly increase, and the variances were also shown to be different.
Atmosphere 2019, 10, x FOR PEER REVIEW 15 of 22 Figure 6 shows the changes of projected future annual precipitation in the Han River basin. Under RCP4.5, it can be seen that the rainfall during the 21st century is shown to have a weakly overall increasing trend and that there was shown to be a slight downward fluctuation, weakly increasing trend, and obviously increasing trend in the periods of 2005-2040, 2041-2059, and 2070-2089, respectively. Under RCP8.5, the increase of precipitation was shown to be more significant after 2040, and there were several years which were shown to have heavy rainfall, such as 2070 and 2089. This is also a valuable topic to study the year of heavy rainfall. Figure 7 compares the statistics of the historical baseline and the middle and late 21st century time series on the basis of quantile-quantile plots under RCP4.5 and RCP8.5. It can be seen that most rain distributions were near the normal distributions. In each sub-figure, three baselines that represented the corresponding normal distributions are shown. The interception and slope of these lines represent the mean and variance, respectively. Compared with the period of 1981-2000, the average precipitation of the mid and late 21st century under RCP4.5 and RCP8.5 were shown to clearly increase, and the variances were also shown to be different.  The trend of annual precipitation was shown to change in the 21st century in the Han River Basin and was coincident with those of previous studies, although the specific results were not the same [47,48]. This conclusion is acceptable because the data used and study strategies were different. There may be obvious seasonality, although no measures were taken to eliminate it in this study. Therefore, the projection of seasonal rainfall may have more uncertainties. There are several studies that separately implemented training ensemble models according to each calendar season or month [49]. This study considered a sufficient number of samples for the training of the SVR methods and thus used whole monthly data for modelling. It is a feasible strategy for this study to study the changes of individual season or month in the 21st century, which would conclude the solution exploration for the barrier of seasonality of rainfall and insufficient samples. Atmosphere 2019, 10, x FOR PEER REVIEW 15 of 22 Figure 6 shows the changes of projected future annual precipitation in the Han River basin. Under RCP4.5, it can be seen that the rainfall during the 21st century is shown to have a weakly overall increasing trend and that there was shown to be a slight downward fluctuation, weakly increasing trend, and obviously increasing trend in the periods of 2005-2040, 2041-2059, and 2070-2089, respectively. Under RCP8.5, the increase of precipitation was shown to be more significant after 2040, and there were several years which were shown to have heavy rainfall, such as 2070 and 2089. This is also a valuable topic to study the year of heavy rainfall. Figure 7 compares the statistics of the historical baseline and the middle and late 21st century time series on the basis of quantile-quantile plots under RCP4.5 and RCP8.5. It can be seen that most rain distributions were near the normal distributions. In each sub-figure, three baselines that represented the corresponding normal distributions are shown. The interception and slope of these lines represent the mean and variance, respectively. Compared with the period of 1981-2000, the average precipitation of the mid and late 21st century under RCP4.5 and RCP8.5 were shown to clearly increase, and the variances were also shown to be different.  The trend of annual precipitation was shown to change in the 21st century in the Han River Basin and was coincident with those of previous studies, although the specific results were not the same [47,48]. This conclusion is acceptable because the data used and study strategies were different. There may be obvious seasonality, although no measures were taken to eliminate it in this study. Therefore, the projection of seasonal rainfall may have more uncertainties. There are several studies that separately implemented training ensemble models according to each calendar season or month [49]. This study considered a sufficient number of samples for the training of the SVR methods and thus used whole monthly data for modelling. It is a feasible strategy for this study to study the changes of individual season or month in the 21st century, which would conclude the solution exploration for the barrier of seasonality of rainfall and insufficient samples. The trend of annual precipitation was shown to change in the 21st century in the Han River Basin and was coincident with those of previous studies, although the specific results were not the same [47,48]. This conclusion is acceptable because the data used and study strategies were different. There may be obvious seasonality, although no measures were taken to eliminate it in this study. Therefore, the projection of seasonal rainfall may have more uncertainties. There are several studies that separately implemented training ensemble models according to each calendar season or month [49]. This study considered a sufficient number of samples for the training of the SVR methods and thus used whole monthly data for modelling. It is a feasible strategy for this study to study the changes of individual season or month in the 21st century, which would conclude the solution exploration for the barrier of seasonality of rainfall and insufficient samples.
There are further plans to train monthly and seasonal models on the basis of daily data, although much uncertainty exists in the daily rainfall. Some successful studies have assessed extreme precipitation events on the basis of daily downscaled precipitation [50]. It is also worth studying daily precipitation on the basis of NEX-GDDP models in future work.

Conclusions
It is important to know the future climate change at the local scale in the Han River basin. Benefitting from the release of the high-resolution downscaled NEX-GDDP dataset, there are many ways to make use of it for studying the simulation and projection of local climate. This study first compared the abilities of three ML methods (MLP, SVR, and RF) for ensemble simulation of 21 NEX-GDDP precipitation models for the historic years of 1961-2005, with MME applied as a reference. Then, on the basis of the results of the SVR models, this study used the QM method to correct the ensemble series. Finally, the SVR_QM ensemble and correction models were applied to project the change of precipitation in the period of 2006-2095 under RCP4.5 and RCP8.5 in this region. Several statistical metrics (PCC, RMSE, Rbias) were used to evaluate and compare the performance of each method. The conclusions were as follows: (1) The raw precipitation simulation of individual NEX-GDDP models had a certain reliability for the Han River basin-the PCC was 0.61-0.71, and RMSE was approximately 48-51 mm. The results of three ML methods and MME all demonstrated their superiority over all individual NEX-GDDP models-the PCC improved to 0.77-0.81, and RMSE was 34-37 mm. The ML performed better than MME. Overall, the SVR showed the best performance-PCC was up to 0.81, and RMSE was up to 34.52 mm. For each station, there were similar conclusions on the whole, although there were less contrary ones for several stations. However, the different performance of each station was obvious. This may have been due to the influence of the raw data, model uncertainty, and especially the local climate. (2) The application of the QM method for the results of SVR models demonstrated the further improvement of the simulation reliability. Although there were some improvements for PCC and RMSE, Rbias was obviously alleviated compared with MME, MLP, SVR, and RF. The Rbias values were reduced to −2.04-0.36% for each station and −0.04% for the region mean. The best models established on the basis of historic series could improve the reliability of projected precipitation. (3) The changes of precipitation during the 21st century in this region had a very significantly increasing trend under RCP4.5 and RCP8.5, whereas there was a slight decreasing fluctuation in the period of 2006-2040. More specifically, compared with the base years, the regional average precipitation during the middle and late 21st century increased by 3.54% and 5.12% under RCP45 and by 7.44% and 9.52% under RCP8.5, respectively. In addition, it can be concluded that the increasing trends existed among most stations under RCP4.5 and RCP8.5, and most of these cases were also significant. These results were expected to be used for the guidance of more accurate long-term management strategies such as water resource allocation, flood mitigation, and ecological layout, among others.
This study first developed SVR_QM ensemble and correction models for NEX-GDDP data in the Han River basin and generated preliminary projections of changes of precipitation during the 21st century for the region, obtaining relatively satisfied results. However, there were some unsolved problems. It may be worthwhile for this study to further explore the improvement of study methods and integrate the influence of local factors, with a subsequent study of the daily datasets of NEX-GDDP.